本次语音之家公开课邀请到陈果果进行分享Speech Recognition Development: A Dataset and Benchmark Perspective。
主题:Speech Recognition Development: A Dataset and Benchmark Perspective
时间:12月15日(周四)14:00-15:00
陈果果
Dr. Guoguo Chen holds a Ph.D. degree in Electrical and Computer Engineering from the Johns Hopkins University and a B.Eng. degree in Electronic Engineering from Tsinghua University. During his Ph.D., he spent 5 years at the Center for Language and Speech Processing, Johns Hopkins University, where he worked on various aspects of speech recognition and was one of the key contributors to the open source speech recognition toolkit Kaldi, and the open source deep learning toolkit CNTK. He was the author of LibriSpeech, one of the most cited (3,500+ Google Scholar citations) speech recognition dataset/benchmark. He also spent two summers at Google Inc. where he developed the prototype of Android's wake word detection engine for "Okay Google", serving billions of Android/Google Home users. After graduation, Dr. Chen co-founded KITT.AI, a CBInsights AI 100 company in 2017, that was funded by Amazon’s Alexa Fund, Paul Allen’s Allen Institute for Artificial Intelligence, Madrona Venture Group, Founders’ Co-op, and A Level Capital. The company released two products: a customizable wake word engine and a conversation AI toolkit. It had more than 100,000 developers and customers over 20 countries in 4 continents. In 2017 KITT.AI was acquired by Baidu, which set up its first Seattle office upon the KITT.AI deal. In 2020, Dr. Chen co-founded Seasalt.ai. Dr. Chen also initiated SpeechColab, a volunteer organization for the speech recognition community, which released one of the largest speech recognition dataset GigaSpeech, covering 10,000 hours of transcribed audio and 33,000 hours of total audio for speech recognition research.
The previous decade saw remarkable development in automatic speech recognition technologies. While there are a lot of technical articles explaining the improvements from the model point of view, the impact of datasets and benchmarks to speech recognition development is not well studied. In this talk, we first investigate the contribution of datasets and benchmarks to speech recognition development. We then introduce a large scale English speech recognition dataset named GigaSpeech. We will demonstrate the data creation pipeline, as well as initial benchmarks on this dataset. Finally, we close this talk by outlining our on-going work for speech recognition benchmarks.
直播将通过CSDN进行直播,手机端、PC端可同步观看
【语音之家公开课】SRD: A Dataset and Benchmark Perspective-CSDN直播
12月15日在直播间,为大家准备1顶SpeechHome主题棒球帽、1个AISHELL5周年玩偶,观看直播互动即可抽取。