{"id":169434,"date":"2004-01-29T16:42:42","date_gmt":"2004-01-29T16:42:42","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/acoustic-modeling\/"},"modified":"2019-08-14T14:50:04","modified_gmt":"2019-08-14T21:50:04","slug":"acoustic-modeling","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/acoustic-modeling\/","title":{"rendered":"Acoustic Modeling"},"content":{"rendered":"<div id=\"en-usprojectsacoustic-modelingdefault\" class=\"page-content\">\n<p>Acoustic modeling of speech typically refers to the process of\u00a0establishing statistical\u00a0representations for the feature vector sequences\u00a0computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc.<\/p>\n<p>Acoustic modeling also encompasses &#8220;pronunciation modeling&#8221;, which describes how a sequence or multi-sequences of fundamental speech units\u00a0(such as phones or phonetic feature) are used to represent larger speech units such as words or phrases which are the object of speech recognition.\u00a0Acoustic modeling may also include the use of feeback information from the recognizer to reshape the feature vectors of speech in achieving noise robustness in speech recognition.<\/p>\n<p>Speech recognition engines usually require two basic components in order to recognize speech.\u00a0One component\u00a0is\u00a0an acoustic model,\u00a0created by taking audio recordings of speech and their transcriptions and then compiling them into statistical representations of the sounds for words. The other component is called\u00a0a language model, which\u00a0gives the probabilities of sequences of words.\u00a0 Language models are often\u00a0used for dictation applications. A special\u00a0type of langauge models is\u00a0regular grammars, which\u00a0are used typically in desktop command and control or telephony\u00a0IVR-type applications.<\/p>\n<p>Our group have been working on acoustic modeling since its inception due to its critical importance in speech technology, speech recognition in particular. We have world-class expertise and researchers\u00a0in this area of research. Recently, we have\u00a0been\u00a0focusing on\u00a0two aspects of acoustic modeling: 1)\u00a0how to establish the statistical models\u00a0and their structures; and 2) how to learn the model parameters automatically from the data. The following are some of our recent projects in the area of acoustic modeling:<\/p>\n<ul>\n<li>Discriminative Learning Algorithms and Procedures for Acoustic Models of Speech<\/li>\n<li>Large-Margin Learning of HMM Parameters<\/li>\n<li>Discriminative pronunciation modeling<\/li>\n<li>Joint discriminative learning of SLU and SR model parameters using N-best\/\/lattice results from speech recognizer<\/li>\n<li>Discriminative acoustic models for Speech Recognition via the use of continuous features in CRF and HCRF<\/li>\n<li>Acoustic feature enhancement by statistical mothods with feedbacks from speech recognition<\/li>\n<li>Compressing HMM parameters for adaptive noise-robust speech recognition<\/li>\n<li>Noise-adaptive and speaker-adaptive\u00a0training of HMM parameters<\/li>\n<li>Parametric modeling of acoustic environment with mixing phases between speech and noise for speech recogntion<\/li>\n<li>Multilingual and cross-lingual speech recognition<\/li>\n<li>Cross-Lingual Speech Recognition under Runtime Resource Constraints<\/li>\n<li>Modeling speech production mechanisms for speech recognition: hidden dynamic modeling; minimum-effort principle for model learning and decoding<\/li>\n<li>Acoustic modeling for casual speech for enhanced voicemail<\/li>\n<li>Active learning for speech recognition<\/li>\n<li>Unsupervised learning for speech recognition<\/li>\n<li>Variable-Parameter HMMs<\/li>\n<li>Acoustic modeling for voice search<\/li>\n<\/ul>\n<\/div>\n<div id=\"en-usprojectsacoustic-modelingdefault\" class=\"page-content\">\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/speech-dialog-research-group\/\">Speech Technology Home<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Acoustic modeling of speech typically refers to the process of\u00a0establishing statistical\u00a0representations for the feature vector sequences\u00a0computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc. [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13554],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-169434","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2004-01-29","related-publications":[156416,156920,156917,156914,156913,156708,156694,156641,156452,156418,157228,156415,156414,156413,156412,156411,156410,156409,156408,156407,160593,371573,165314,164905,164889,164888,164887,161705,160726,160690,156406,160592,158612,158232,158186,157806,157804,157721,157698,155990,156374,156373,156371,156370,156369,156176,156170,156169,156168,156375,155941,155638,155399,155389,155315,155122,154792,154606,156384,156405,156404,156403,156402,156401,156400,156399,156386,156385,154302,156383,156382,156381,156380,156379,156378,156377,156376],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"related-researchers":[],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169434","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169434\/revisions"}],"predecessor-version":[{"id":603582,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169434\/revisions\/603582"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=169434"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=169434"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=169434"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=169434"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=169434"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}