[Seminar] Recent advances in Google speech recognition systems
■호스트: 김건희 교수 (x7300,880-7300)
■문의: Vision and Learning Lab. (02-880-7289)
In this talk, we will discuss recent advances in speech recognition techniques using neural networks and very large training sets. First, we will give an overview of recent acoustic model training techniques such as Cross Entropy (CE) training, Connectionist Temporal Classification (CTC), various discriminative sequence training such as state-level Minimum Bayes Risk (sMBR). We also describe how to model the acoustic feature distribution using Feed-Forward Deep Neural Networks (FF-DNNs), Long Short-Term Memories (LSTMs). We will discuss simulated data generation and semi-supervised training. We usually do not have enough data for new speech recognition domains. To solve this problem, we create large-scale acoustically simulated databases from existing data. For very large training sets, labeling has been always a very time-consuming and difficult problem. We discuss semi-supervised training techniques to generate labels for such cases. Finally, we will look into end-to-end neural recognizer combining the Acoustic Modeling (AM), the Language Modeling (LM), and the Pronunciation Model (PM) as a single end-to-end system. We describe the attention-model based approaches, the CTC-based approaches, and the Recursive Neural Network (RNN) transducer techniques, and compare their performances.
Chanwoo Kim has been a senior software engineer at Google, Inc. since 2013. He has been working for acoustic modeling for google speech recognition systems including Google Home and enhancing noise robustness using deep learning techniques. He was a speech scientist at Microsoft from 2011 to 2013. Dr. Kim received a Ph.D. from the Language Technologies Institute of School of Computer Science Carnegie Mellon University in 2010. He received his B.S and M.S. degrees in Electrical Engineering from Seoul National University in 1998 and 2001, respectively. Dr. Kim’s doctoral research was focused on enhancing the robustness of automatic speech recognition systems in noisy environments. Between 2003 and 2005 Dr. Kim was a Senior Research Engineer at LG Electronics, where he worked primarily on embedded signal processing and protocol stacks for multimedia systems. Prior to his employment at LG, he worked for EdumediaTek and SK Teletech as a R&D engineer.