{"id":156377,"date":"2009-04-01T00:00:00","date_gmt":"2009-04-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/a-study-on-multilingual-acoustic-modeling-for-large-vocabulary-asr\/"},"modified":"2018-10-16T20:22:53","modified_gmt":"2018-10-17T03:22:53","slug":"a-study-on-multilingual-acoustic-modeling-for-large-vocabulary-asr","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/a-study-on-multilingual-acoustic-modeling-for-large-vocabulary-asr\/","title":{"rendered":"A Study on Multilingual Acoustic Modeling For Large Vocabulary ASR"},"content":{"rendered":"
\n

We study key issues related to multilingual acoustic modeling for automatic speech recognition (ASR) through a series of large-scale ASR experiments. Our study explores shared structures embedded in a large collection of speech data spanning over a number of spoken languages in order to establish a common set of universal phone models that can be used for large vocabulary ASR of all the languages seen or unseen during training. Language-universal and language-adaptive models are compared with language-specific models, and the comparison results show that in many cases it is possible to build general-purpose language-universal and language-adaptive acoustic models that outperform language-specific ones if the set of shared units, the structure of shared states, and the shared acoustic-phonetic properties among different languages can be properly utilized. Specifically, our results demonstrate that when the context coverage is poor in language-specific training, we can use one tenth of the adaptation data to achieve equivalent performance in cross-lingual speech recognition.<\/p>\n<\/div>\n

<\/p>\n","protected":false},"excerpt":{"rendered":"

We study key issues related to multilingual acoustic modeling for automatic speech recognition (ASR) through a series of large-scale ASR experiments. Our study explores shared structures embedded in a large collection of speech data spanning over a number of spoken languages in order to establish a common set of universal phone models that can be […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13545,13554],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-156377","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"Institute of Electrical and Electronics Engineers, Inc.","msr_edition":"Proceedings of the ICASSP","msr_affiliation":"","msr_published_date":"2009-04-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"207741","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"0004333.pdf","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/0004333.pdf","id":207741,"label_id":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Hui Lin","user_id":0,"rest_url":false},{"type":"user_nicename","value":"deng","user_id":31602,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=deng"},{"type":"text","value":"Chi-Hui Lee","user_id":0,"rest_url":false},{"type":"user_nicename","value":"dongyu","user_id":31667,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=dongyu"},{"type":"user_nicename","value":"alexac","user_id":30932,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=alexac"},{"type":"user_nicename","value":"ygong","user_id":34994,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=ygong"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[169434],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":169434,"post_title":"Acoustic Modeling","post_name":"acoustic-modeling","post_type":"msr-project","post_date":"2004-01-29 16:42:42","post_modified":"2019-08-14 14:50:04","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/acoustic-modeling\/","post_excerpt":"Acoustic modeling of speech typically refers to the process of\u00a0establishing statistical\u00a0representations for the feature vector sequences\u00a0computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc. Acoustic modeling also encompasses \"pronunciation modeling\", which describes how a sequence or multi-sequences of fundamental speech units\u00a0(such as phones or…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169434"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/156377"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/156377\/revisions"}],"predecessor-version":[{"id":433083,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/156377\/revisions\/433083"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=156377"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=156377"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=156377"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=156377"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=156377"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=156377"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=156377"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=156377"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=156377"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=156377"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=156377"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=156377"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=156377"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=156377"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=156377"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=156377"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}