Return to Microsoft Research Lab – Redmond

Speech Research Team

The Speech Research Team is part of the Azure Cognitive Services Research (CSR) group and is responsible for fundamental advances in audio, speech, and spoken language processing technologies. We also work closely with engineering and product teams to bring the new technologies into Microsoft products.

We work on a wide range of speech processing problems, including speech enhancement, speech recognition, speaker diarization, multi-lingual speech recognition, spoken language understanding, end-to-end modeling, self-supervised learning, and multi-modal modeling. Our recent work covers the following topics.

Deep learning-based real-time speech enhancement
Monaural and multi-channel speech separation for meeting transcription
Ad hoc microphone arrays
End-to-end modeling for speaker-attributed speech recognition
Unified speech representation learning
Speech-language pre-training

The results of our work are delivered to Microsoft speech technologies and interwoven into various products. We also contributed to the development of new services, such as Conversation Transcription (opens in new tab) of Azure Cognitive Services which is powering the transcription features of several Microsoft products. We received the IEEE Signal Processing Society Conference Best Paper Award for Industry at ICASSP 2022. Our work resulted in the first place in the speaker diarization track of VoxSRC-20 (opens in new tab) (joint work with other Microsoft scientists and Microsoft Research researchers) and the breakthrough human parity performance on the Switchboard conversational speech recognition task.