{"id":595954,"date":"2019-06-28T19:05:46","date_gmt":"2019-06-29T02:05:46","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=595954"},"modified":"2021-10-17T23:01:17","modified_gmt":"2021-10-18T06:01:17","slug":"meeting-transcription-using-asynchronous-distant-microphones","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/meeting-transcription-using-asynchronous-distant-microphones\/","title":{"rendered":"Meeting Transcription Using Asynchronous Distant Microphones"},"content":{"rendered":"
We describe a system that generates speaker-annotated transcripts of meetings by using multiple asynchronous distant microphones. The system is composed of continuous audio stream alignment, blind beamforming, speech recognition, speaker diarization, and system combination. While the idea of improving the meeting transcription accuracy by leveraging multiple recordings has been investigated in certain specific technology areas such as beamforming, our objective is to assess the feasibility of a complete system with a set of mobile devices and conduct a detailed analysis. With seven input audio streams, our system achieves a word error rate (WER) of 22.3% and a speaker-attributed WER (SAWER) of 26.7%, and comes within 3% of the close-talking microphone WER on non-overlapping speech. The relative gains in SAWER over a single-device system are 14.8%, 20.3%, and 22.4% for three, five, and seven microphones, respectively.\u00a0 The full system achieves a 13.6% diarization error rate, 10% of which are due to overlapped speech.<\/p>\n","protected":false},"excerpt":{"rendered":"
We describe a system that generates speaker-annotated transcripts of meetings by using multiple asynchronous distant microphones. The system is composed of continuous audio stream alignment, blind beamforming, speech recognition, speaker diarization, and system combination. While the idea of improving the meeting transcription accuracy by leveraging multiple recordings has been investigated in certain specific technology areas […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[243062,13545],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-595954","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-audio-acoustics","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2019-9-15","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"ISCA","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/06\/meeting_transcription_using_asynchronous_distant_microphones.pdf","id":"596137","title":"meeting_transcription_using_asynchronous_distant_microphones","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":596137,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/07\/meeting_transcription_using_asynchronous_distant_microphones.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"Takuya Yoshioka","user_id":36278,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Takuya Yoshioka"},{"type":"user_nicename","value":"Dimitrios Dimitriadis","user_id":37521,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Dimitrios Dimitriadis"},{"type":"user_nicename","value":"Andreas Stolcke","user_id":31054,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Andreas Stolcke"},{"type":"user_nicename","value":"William Hinthorn","user_id":38326,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=William Hinthorn"},{"type":"text","value":"Zhuo Chen","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Michael Zeng","user_id":33141,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Michael Zeng"},{"type":"user_nicename","value":"Xuedong Huang","user_id":34869,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Xuedong Huang"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[664548,783091],"msr_project":[585154,171185],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":585154,"post_title":"Project Denmark","post_name":"project-denmark","post_type":"msr-project","post_date":"2019-05-09 13:13:15","post_modified":"2020-11-12 13:43:43","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-denmark\/","post_excerpt":"The goal of Project Denmark is to move beyond the need for traditional microphone arrays, such as those supported by Microsoft\u2019s Speech Devices SDK, to achieve high-quality capture of meeting conversations.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/585154"}]}},{"ID":171185,"post_title":"Meeting Recognition and Understanding","post_name":"meeting-recognition-and-understanding","post_type":"msr-project","post_date":"2013-07-30 14:28:35","post_modified":"2023-08-12 21:11:41","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/meeting-recognition-and-understanding\/","post_excerpt":"In most organizations, staff spend many hours in meetings. This project addresses all levels of analysis and understanding, from speaker tracking and robust speech transcription to meaning extraction and summarization, with the goal of increasing productivity both during the meeting and after, for both participants and nonparticipants. The Meeting Recognition and Understanding project is a collection of online and offline spoken language understanding tasks. The following functions could be performed both on- and offline, but…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171185"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/595954","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/595954\/revisions"}],"predecessor-version":[{"id":596140,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/595954\/revisions\/596140"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=595954"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=595954"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=595954"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=595954"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=595954"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=595954"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=595954"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=595954"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=595954"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=595954"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=595954"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=595954"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=595954"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=595954"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=595954"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=595954"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}