{"id":169434,"date":"2004-01-29T16:42:42","date_gmt":"2004-01-29T16:42:42","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/acoustic-modeling\/"},"modified":"2019-08-14T14:50:04","modified_gmt":"2019-08-14T21:50:04","slug":"acoustic-modeling","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/acoustic-modeling\/","title":{"rendered":"Acoustic Modeling"},"content":{"rendered":"
\n

Acoustic modeling of speech typically refers to the process of\u00a0establishing statistical\u00a0representations for the feature vector sequences\u00a0computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc.<\/p>\n

Acoustic modeling also encompasses “pronunciation modeling”, which describes how a sequence or multi-sequences of fundamental speech units\u00a0(such as phones or phonetic feature) are used to represent larger speech units such as words or phrases which are the object of speech recognition.\u00a0Acoustic modeling may also include the use of feeback information from the recognizer to reshape the feature vectors of speech in achieving noise robustness in speech recognition.<\/p>\n

Speech recognition engines usually require two basic components in order to recognize speech.\u00a0One component\u00a0is\u00a0an acoustic model,\u00a0created by taking audio recordings of speech and their transcriptions and then compiling them into statistical representations of the sounds for words. The other component is called\u00a0a language model, which\u00a0gives the probabilities of sequences of words.\u00a0 Language models are often\u00a0used for dictation applications. A special\u00a0type of langauge models is\u00a0regular grammars, which\u00a0are used typically in desktop command and control or telephony\u00a0IVR-type applications.<\/p>\n

Our group have been working on acoustic modeling since its inception due to its critical importance in speech technology, speech recognition in particular. We have world-class expertise and researchers\u00a0in this area of research. Recently, we have\u00a0been\u00a0focusing on\u00a0two aspects of acoustic modeling: 1)\u00a0how to establish the statistical models\u00a0and their structures; and 2) how to learn the model parameters automatically from the data. The following are some of our recent projects in the area of acoustic modeling:<\/p>\n