{"id":164509,"date":"2013-05-01T00:00:00","date_gmt":"2013-05-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/articulatory-features-for-large-vocabulary-speech-recognition\/"},"modified":"2018-10-16T20:20:01","modified_gmt":"2018-10-17T03:20:01","slug":"articulatory-features-for-large-vocabulary-speech-recognition","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/articulatory-features-for-large-vocabulary-speech-recognition\/","title":{"rendered":"Articulatory features for large vocabulary speech recognition"},"content":{"rendered":"
\n

Studies have demonstrated that articulatory information can model speech variability effectively and can potentially help to improve speech recognition performance. Most of the studies involving articulatory information have focused on effectively estimating them from speech, and few studies have actually used such features for speech recognition. Speech recognition studies using articulatory information have been mostly confined to digit or medium vocabulary speech recognition, and efforts to incorporate them into large vocabulary systems have been limited. We present a neural network model to estimate articulatory trajectories from speech signals where the model was trained using synthetic speech signals generated by Haskins Laboratories\u2019 task-dynamic model of speech production. The trained model was applied to natural speech, and the estimated articulatory trajectories obtained from the models were used in conjunction with standard cepstral features to train acoustic models for large-vocabulary recognition systems. Two different large-vocabulary English datasets were used in the experiments reported here. Results indicate that employing articulatory information improves speech recognition performance not only under clean conditions but also under noisy background conditions. Perceptually motivated robust features were also explored in this study and the best performance was obtained when systems based on articulatory, standard cepstral and perceptually motivated feature were all combined.<\/p>\n<\/div>\n

<\/p>\n","protected":false},"excerpt":{"rendered":"

Studies have demonstrated that articulatory information can model speech variability effectively and can potentially help to improve speech recognition performance. Most of the studies involving articulatory information have focused on effectively estimating them from speech, and few studies have actually used such features for speech recognition. Speech recognition studies using articulatory information have been mostly […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13545],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-164509","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"IEEE SPS","msr_edition":"Proc. IEEE ICASSP","msr_affiliation":"","msr_published_date":"2013-05-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"7145-7149","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"205486","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"ICASSP13_MODTVs_v10.pdf","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/ICASSP13_MODTVs_v10.pdf","id":205486,"label_id":0}],"msr_related_uploader":"","msr_attachments":[{"id":205486,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/ICASSP13_MODTVs_v10.pdf"}],"msr-author-ordering":[{"type":"text","value":"Vikramjit Mitra","user_id":0,"rest_url":false},{"type":"text","value":"Wen Wang","user_id":0,"rest_url":false},{"type":"user_nicename","value":"anstolck","user_id":31054,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=anstolck"},{"type":"text","value":"Hosung Nam","user_id":0,"rest_url":false},{"type":"text","value":"Colleen Richey","user_id":0,"rest_url":false},{"type":"text","value":"Jiahong Yuan","user_id":0,"rest_url":false},{"type":"text","value":"Mark Liberman","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[320309],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":320309,"post_title":"Speech Technology for Computational Phonetics and Reading Assessment","post_name":"speech-technology-corpus-based-phonetics","post_type":"msr-project","post_date":"2016-11-11 18:50:01","post_modified":"2017-06-19 09:42:28","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/speech-technology-corpus-based-phonetics\/","post_excerpt":"This project aims to develop new tools for phonetics research on large speech corpora without requiring traditional phonetic annotations by humans.\u00a0 The idea is to\u00a0adapt tools from speech recognition to replace the costly and time-consuming annotations usually required for phonetics research. This project was originally started by an NSF grant \"New tools and methods for very-large-scale phonetics research\" to UPenn\u00a0and SRI, with a Microsoft researcher as a consultant. More recently, work on computational phonetics has…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/320309"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/164509","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/164509\/revisions"}],"predecessor-version":[{"id":526771,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/164509\/revisions\/526771"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=164509"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=164509"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=164509"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=164509"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=164509"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=164509"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=164509"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=164509"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=164509"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=164509"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=164509"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=164509"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=164509"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=164509"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=164509"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=164509"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}