{"id":748330,"date":"2021-05-28T10:21:26","date_gmt":"2021-05-28T17:21:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&p=748330"},"modified":"2025-08-06T11:51:22","modified_gmt":"2025-08-06T18:51:22","slug":"icassp-2021","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/icassp-2021\/","title":{"rendered":"Microsoft at ICASSP 2021"},"content":{"rendered":"\n\n

Website:<\/strong> ICASSP 2021 (opens in new tab)<\/span><\/a>Opens in a new tab<\/span><\/p>\n

Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) event (opens in new tab)<\/span><\/a>. See more details on our contributions below.<\/p>\n

 <\/p>\n

Session Chairs<\/h3>\n

The following Microsoft researchers will chair sessions at the conference.<\/p>\n

Zhuo Chen<\/a>
\n
Hannes Gamper<\/a>
\n
Yifan Gong<\/a>
\n
Jinyu Li<\/a>
\n
Zhong Meng<\/a>
\n
Chandan K A Reddy<\/a>
\n
Ivan Tashev<\/a>
\n
Takuya Yoshioka<\/a>Opens in a new tab<\/span><\/p>\n

All times are displayed in\u00a0Eastern Daylight Time (UTC -4)<\/p>\n

Monday, June 7<\/h2>\n

10:00 \u2013 13:30 | Tutorial<\/p>\n

Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization<\/strong><\/p>\n

Presenters: Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda<\/a>, Shinji Watanabe<\/p>\n

18:00 \u2013 19:00<\/p>\n

Young Professionals Panel Discussion<\/strong><\/p>\n

Moderator: Subhro Das
\nPanelists:\u00a0Sabrina Rashid, Vanessa Testoni,\u00a0
Hamid\u00a0Palangi<\/a><\/p>\n


\n

Tuesday, June 8<\/h2>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\n

Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Renqian\u00a0Luo,\u00a0Xu Tan<\/a>,\u00a0Rui Wang<\/a>,\u00a0Tao Qin<\/a>,\u00a0Jinzhu\u00a0Li (opens in new tab)<\/span><\/a>,\u00a0Sheng Zhao (opens in new tab)<\/span><\/a>,\u00a0Enhong\u00a0Chen,\u00a0Tie-Yan Liu<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\n

A New\u00a0High Quality\u00a0Trajectory Tiling Based Hybrid TTS In Real Time<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Feng-Long Xie, Xin-Hui Li, Wen-Chao\u00a0Su, Li Lu,\u00a0Frank K. Soong<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Modeling 1: Fusion and Training for End-to-End ASR<\/p>\n

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition<\/strong><\/a><\/p>\n

Zhong Meng<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur (opens in new tab)<\/span><\/a>,\u00a0Sarangarajan Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Liang Lu (opens in new tab)<\/span><\/a>,\u00a0Xie Chen<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 1: Speech Separation (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Zhuo Chen<\/a><\/p>\n

Rethinking The Separation Layers\u00a0In\u00a0Speech Separation Networks<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Yi Luo,\u00a0Zhuo Chen<\/a>, Cong Han, Chenda Li,\u00a0Tianyan Zhou (opens in new tab)<\/span><\/a>, Nima\u00a0Mesgarani<\/p>\n

13:00 \u2013 13:45 | Deep Learning Training Methods 3 (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Jinyu Li<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Brain-Computer Interfaces<\/p>\n

Decoding Music Attention from \u201cEEG Headphones\u201d: A User-Friendly Auditory Brain-Computer Interface<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Wenkang\u00a0An, Barbara Shinn-Cunningham,\u00a0Hannes Gamper<\/a>,\u00a0Dimitra Emmanouilidou<\/a>,\u00a0David Johnston<\/a>,\u00a0Mihai Jalobeanu<\/a>,\u00a0Edward Cutrell<\/a>,\u00a0Andrew Wilson<\/a>, Kuan-Jung Chiang,\u00a0Ivan Tashev<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Takuya Yoshioka<\/a><\/p>\n

Dual-Path Modeling for Long Recording Speech Separation in Meetings<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Chenda Li,\u00a0Zhuo Chen<\/a>, Yi Luo, Cong Han,\u00a0Tianyan Zhou (opens in new tab)<\/span><\/a>, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian<\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation<\/p>\n

Continuous Speech Separation with Conformer<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Sanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Jian Wu (opens in new tab)<\/span><\/a>,\u00a0Jinyu Li<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Chengyi Wang (opens in new tab)<\/span><\/a>,\u00a0Shujie Liu<\/a>,\u00a0Ming Zhou (opens in new tab)<\/span><\/a><\/p>\n

14:00 \u2013 14:45 | Speech Enhancement 2: Speech Separation and Dereverberation (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Takuya Yoshioka<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speaker Recognition 1: Benchmark Evaluation<\/p>\n

Microsoft Speaker\u00a0Diarization\u00a0System for the\u00a0Voxceleb\u00a0Speaker Recognition Challenge 2020<\/strong><\/a><\/p>\n

Xiong Xiao (opens in new tab)<\/span><\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Tianyan Zhou (opens in new tab)<\/span><\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Sanyuan Chen (opens in new tab)<\/span><\/a>,\u00a0Yong Zhao (opens in new tab)<\/span><\/a>,\u00a0Gang Liu (opens in new tab)<\/span><\/a>,\u00a0Yu Wu<\/a>,\u00a0Jian Wu (opens in new tab)<\/span><\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Dialogue Systems 2: Response Generation<\/p>\n

Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Shijie\u00a0Zhou, Wenge Rong,\u00a0Jianfei\u00a0Zhang,\u00a0Yanmeng\u00a0Wang,\u00a0Libin Shi (opens in new tab)<\/span><\/a>, Zhang Xiong<\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Recognition 4: Transformer Models 2<\/p>\n

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Xie Chen<\/a>,\u00a0Yu Wu<\/a>,\u00a0Zhenghao Wang (opens in new tab)<\/span><\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Hannes Gamper<\/a><\/p>\n

ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Kusha Sridhar,\u00a0Ross Cutler (opens in new tab)<\/span><\/a>,\u00a0Ando Saabas (opens in new tab)<\/span><\/a>,\u00a0Tanel Parnamaa,\u00a0Markus Loide (opens in new tab)<\/span><\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan (opens in new tab)<\/span><\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Learning (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Zhong Meng<\/a><\/p>\n

Sequence-Level Self-Teaching Regularization<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Eric Sun,\u00a0Liang Lu (opens in new tab)<\/span><\/a>,\u00a0Zhong Meng<\/a>,\u00a0Yifan Gong<\/a><\/p>\n


\n

Wednesday, June 9<\/h2>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Understanding 1: End-to-end Speech Understanding 1<\/p>\n

Speech-Language Pre-Training for End-to-End Spoken Language Understanding<\/strong><\/a><\/p>\n

Yao Qian<\/a>, Ximo\u00a0Bian,\u00a0Yu Shi<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Leo Shen,\u00a0Zhen Xiao (opens in new tab)<\/span><\/a>,\u00a0Michael Zeng<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 4: Multi-Channel Source Separation<\/p>\n

DBnet:\u00a0Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Ali\u00a0Aroudi,\u00a0Sebastian Braun<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 4: Multi-channel Processing<\/p>\n

Don\u2019t Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Sanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Xiangzhan\u00a0Yu<\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Matrix Factorization and Applications<\/p>\n

Cold Start Revisited: A Deep Hybrid Recommender with Cold-Warm Item Harmonization<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Oren Barkan,\u00a0Roy Hirsch (opens in new tab)<\/span><\/a>,\u00a0Ori Katz,\u00a0Avi Caciularu (opens in new tab)<\/span><\/a>,\u00a0Yoni Weill,\u00a0Noam Koenigstein (opens in new tab)<\/span><\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Biological Image Analysis<\/p>\n

CMIM: Cross-Modal Information Maximization\u00a0For\u00a0Medical Imaging<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Tristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di\u00a0Jorio, Margaux Luck,\u00a0Devon Hjelm<\/a>, Yoshua\u00a0Bengio<\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 8: Multilingual Speech Recognition<\/p>\n

Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Amit Das (opens in new tab)<\/span><\/a>,\u00a0Kshitiz Kumar (opens in new tab)<\/span><\/a>,\u00a0Jian Wu (opens in new tab)<\/span><\/a><\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\n

MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Yichong\u00a0Leng,\u00a0Xu Tan<\/a>,\u00a0Sheng Zhao (opens in new tab)<\/span><\/a>,\u00a0Frank K. Soong<\/a>, Xiang-Yang Li,\u00a0Tao Qin<\/a><\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\n

Crowdsourcing Approach for Subjective Evaluation of Echo Impairment<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Ross Cutler (opens in new tab)<\/span><\/a>, Babak\u00a0Nadari,\u00a0Markus Loide (opens in new tab)<\/span><\/a>,\u00a0Sten Sootla (opens in new tab)<\/span><\/a>,\u00a0Ando Saabas (opens in new tab)<\/span><\/a><\/p>\n

16:30 \u2013 17:15 | Speech Recognition 9: Confidence Measures (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Yifan Gong<\/a><\/p>\n

16:30 \u2013 17:15 | Speech Recognition 10: Robustness to Human Speech Variability (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Yifan Gong<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Processing 2: General Topics<\/p>\n

Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Chandan K A Reddy<\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler (opens in new tab)<\/span><\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Style and Text Normalization<\/p>\n

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model<\/strong><\/a><\/p>\n

Junwei Liao,\u00a0Yu Shi<\/a>,\u00a0Ming Gong<\/a>,\u00a0Linjun Shou<\/a>,\u00a0Sefik Eskimez<\/a>,\u00a0Liyang Lu<\/a>, Hong Qu,\u00a0Michael Zeng<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Modeling, Analysis and Synthesis of Acoustic Environments 3: Acoustic Analysis<\/p>\n

Prediction of Object Geometry from Acoustic Scattering Using Convolutional Neural Networks<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Ziqi Fan,\u00a0Vibhav Vineet<\/a>,\u00a0Chenshen\u00a0Lu, T.W. Wu, Kyla McMullen<\/p>\n


\n

Thursday, June 10<\/h2>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Recognition 11: Novel Approaches<\/p>\n

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR<\/strong><\/a><\/p>\n

Naoyuki Kanda<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Liang Lu (opens in new tab)<\/span><\/a>,\u00a0Yashesh Gaur (opens in new tab)<\/span><\/a>,\u00a0Xiaofei Wang<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 5: Prosody & Style<\/p>\n

Speech Bert Embedding for Improving Prosody in Neural TTS<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Liping Chen (opens in new tab)<\/span><\/a>,\u00a0Yan Deng (opens in new tab)<\/span><\/a>,\u00a0Xi Wang (opens in new tab)<\/span><\/a>,\u00a0Frank K. Soong<\/a>,\u00a0Lei He (opens in new tab)<\/span><\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 6: Data Augmentation & Adaptation<\/p>\n

Adaspeech\u00a02: Adaptive Text to Speech with\u00a0Untranscribed\u00a0Data<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Yuzi Yan,\u00a0Xu Tan<\/a>,\u00a0Bohan Li,\u00a0Tao Qin<\/a>,\u00a0Sheng Zhao (opens in new tab)<\/span><\/a>, Yuan Shen,\u00a0Tie-Yan Liu<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 5: DNS Challenge Task (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Chandan K A Reddy<\/a><\/p>\n

ICASSP 2021 Deep Noise Suppression Challenge<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Chandan K A Reddy<\/a>,\u00a0Harishchandra Dubey (opens in new tab)<\/span><\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler (opens in new tab)<\/span><\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan (opens in new tab)<\/span><\/a><\/p>\n

14:00 \u2013 14:45 | Speech Enhancement 6: Multi-modal Processing (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Chandan K A Reddy<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Graph Signal Processing<\/p>\n

Fast Hierarchy Preserving Graph Embedding via Subspace Constraints<\/strong><\/a><\/p>\n

Xu Chen,\u00a0Lun Du<\/a>,\u00a0Mengyuan\u00a0Chen, Yun Wang, QingQing Long,\u00a0Kunqing\u00a0Xie<\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 13: Acoustic Modeling 1<\/p>\n

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings<\/strong><\/a><\/p>\n

Xuankai\u00a0Chang,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur (opens in new tab)<\/span><\/a>,\u00a0Xiaofei Wang<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Takuya Yoshioka<\/a><\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 14: Acoustic Modeling 2<\/p>\n

Ensemble Combination between Different Time Segmentations<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Jeremy Heng Meng Wong (opens in new tab)<\/span><\/a>,\u00a0Dimitrios Dimitriadis<\/a>,\u00a0Kenichi Kumatani (opens in new tab)<\/span><\/a>,\u00a0Yashesh Gaur (opens in new tab)<\/span><\/a>,\u00a0George Polovets (opens in new tab)<\/span><\/a>,\u00a0Partha Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a><\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Privacy and Information Security<\/p>\n

Detection Of Malicious DNS and Web Servers using Graph-Based Approaches<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Jinyuan\u00a0Jia,\u00a0Zheng Dong (opens in new tab)<\/span><\/a>,\u00a0Jie Li (opens in new tab)<\/span><\/a>,\u00a0Jack W. Stokes<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Language Assessment<\/p>\n

Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Bin\u00a0Su,\u00a0Shaoguang Mao (opens in new tab)<\/span><\/a>,\u00a0Frank K. Soong<\/a>,\u00a0Yan Xia<\/a>,\u00a0Jonathan Tien<\/a>,\u00a0Zhiyong\u00a0Wu<\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Signal Enhancement and Restoration 1: Deep Learning<\/p>\n

Towards Efficient Models for Real-Time Deep Noise Suppression<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Sebastian Braun<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Chandan K A Reddy<\/a>,\u00a0Ivan Tashev<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Signal Enhancement and Restoration 3: Signal Enhancement<\/p>\n

Phoneme-Based Distribution Regularization for Speech Enhancement<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Yajing\u00a0Liu,\u00a0Xiulian Peng<\/a>, Zhiwei Xiong,\u00a0Yan Lu<\/a><\/p>\n

16:30 \u2013 17:15 | Audio & Images (opens in new tab)<\/span><\/a><\/p>\n

Session Chair: Ivan Tashev<\/a><\/p>\n


\n

Friday, June 11<\/h2>\n

1:30\u00a0\u2013\u00a012:15\u00a0|\u00a0Speech Recognition 18: Low Resource ASR<\/p>\n

MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Linghui\u00a0Meng,\u00a0Jin\u00a0Xu,\u00a0Xu Tan<\/a>,\u00a0Jindong Wang<\/a>,\u00a0Tao Qin<\/a>, Bo Xu<\/p>\n

11:30\u00a0\u2013\u00a012:15\u00a0|\u00a0Speech Synthesis 7: General Topics<\/p>\n

Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling<\/strong><\/a><\/p>\n

Chen Zhang, Yi Ren,\u00a0Xu Tan<\/a>,\u00a0Jinglin\u00a0Liu,\u00a0Kejun\u00a0Zhang,\u00a0Tao Qin<\/a>,\u00a0Sheng Zhao (opens in new tab)<\/span><\/a>,\u00a0Tie-Yan Liu<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Enhancement 8: Echo Cancellation and Other Tasks<\/p>\n

Cascaded Time + Time-Frequency\u00a0Unet\u00a0For\u00a0Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Arun Asokan Nair,\u00a0Kazuhito Koishida<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speaker\u00a0Diarization<\/p>\n

Hidden Markov Model\u00a0Diarisation\u00a0with Speaker Location Information<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Jeremy Heng\u00a0Meng Wong (opens in new tab)<\/span><\/a>,\u00a0Xiong Xiao (opens in new tab)<\/span><\/a>,\u00a0Yifan Gong<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Detection and Classification of Acoustic Scenes and Events 5: Scenes<\/p>\n

Cross-Modal Spectrum Transformation Network for Acoustic Scene Classification<\/strong> (opens in new tab)<\/span><\/a><\/p>\n

Yang Liu,\u00a0Alexandros Neophytou<\/a>,\u00a0Sunando Sengupta<\/a>,\u00a0Eric Sommerlade<\/a>Opens in a new tab<\/span><\/p>\n

ICASSP 2021 Acoustic Echo Cancellation Challenge<\/h2>\n

The ICASSP 2021 Acoustic Echo Cancellation Challenge (opens in new tab)<\/span><\/a> is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 17 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Acoustic Echo Cancellation Challenge.<\/p>\n

 <\/p>\n

1st place<\/h3>\n

Organization: Amazon
\nAuthors: Jean-Marc Valin, Srikanth Tenneti, Karim Helwani, Umut Isik, Arvindh Krishnaswamy
\nPaper:
Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet (opens in new tab)<\/span><\/a><\/p>\n


\n

2nd place<\/h3>\n

Organization: SoundConnect and Alibaba
\nAuthors: Ziteng Wang, Yueyue Na, Zhang Liu, Biao Tian, Qiang Fu
\nPaper:
Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge (opens in new tab)<\/span><\/a><\/p>\n


\n

3rd place<\/h3>\n

Organization: Carl von Ossietzky University Oldenburg
\nAuthors: Nils L. Westhausen, Bernd T. Meyer
\nPaper:
Acoustic echo cancellation with the dual-signal transformation LSTM network (opens in new tab)<\/span><\/a><\/p>\n

 <\/p>\n

ICASSP 2021 Deep Noise Suppression (DNS) Challenge<\/h2>\n

The ICASSP 2021 Deep Noise Suppression (DNS) Challenge (opens in new tab)<\/span><\/a> is intended to stimulate research in the area of noise suppression, which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 19 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Deep Noise Suppression Challenge.<\/p>\n

 <\/p>\n

1st place<\/h3>\n

Organization: Institute of Acoustics, Chinese Academy of Sciences
\nAuthors: Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li
\nPaper:
ICASSP 2021 DEEP NOISE SUPPRESSION CHALLENGE: DECOUPLING MAGNITUDE AND PHASE OPTIMIZATION WITH A TWO-STAGE DEEP NETWORK (opens in new tab)<\/span><\/a><\/p>\n


\n

2nd place<\/h3>\n

Organization: Sogou
\nAuthors: Jingdong Li, Dawei Luo, Yun Liu, Yuanyuan Zhu, Zhaoxia Li, Guohui Cui, Wenqi Tang, Wei Chen
\nPaper:
Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement (opens in new tab)<\/span><\/a><\/p>\n


\n

3rd place<\/h3>\n

Organization: Seol National University, Supertone
\nAuthors: Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee
\nPaper:
REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET (opens in new tab)<\/span><\/a>Opens in a new tab<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"

Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) event.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_startdate":"2021-06-06","msr_enddate":"2021-06-11","msr_location":"Virtual","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"","msr_hide_region":true,"msr_private_event":false,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[243062,13545],"msr-region":[256048],"msr-event-type":[197941],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-748330","msr-event","type-msr-event","status-publish","hentry","msr-research-area-audio-acoustics","msr-research-area-human-language-technologies","msr-region-global","msr-event-type-conferences","msr-locale-en_us"],"msr_about":"\n\n

Website:<\/strong> ICASSP 2021 (opens in new tab)<\/span><\/a>Opens in a new tab<\/span><\/p>\n

Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) event<\/a>. See more details on our contributions below.<\/p>\n

 <\/p>\n

Session Chairs<\/h3>\n

The following Microsoft researchers will chair sessions at the conference.<\/p>\n

Zhuo Chen<\/a>
\n
Hannes Gamper<\/a>
\n
Yifan Gong<\/a>
\n
Jinyu Li<\/a>
\n
Zhong Meng<\/a>
\n
Chandan K A Reddy<\/a>
\n
Ivan Tashev<\/a>
\n
Takuya Yoshioka<\/a>Opens in a new tab<\/span><\/p>\n

All times are displayed in\u00a0Eastern Daylight Time (UTC -4)<\/p>\n

Monday, June 7<\/h2>\n

10:00 \u2013 13:30 | Tutorial<\/p>\n

Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization<\/strong><\/p>\n

Presenters: Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda<\/a>, Shinji Watanabe<\/p>\n

18:00 \u2013 19:00<\/p>\n

Young Professionals Panel Discussion<\/strong><\/p>\n

Moderator: Subhro Das
\nPanelists:\u00a0Sabrina Rashid, Vanessa Testoni,\u00a0
Hamid\u00a0Palangi<\/a><\/p>\n


\n

Tuesday, June 8<\/h2>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\n

Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search<\/strong><\/a><\/p>\n

Renqian\u00a0Luo,\u00a0Xu Tan<\/a>,\u00a0Rui Wang<\/a>,\u00a0Tao Qin<\/a>,\u00a0Jinzhu\u00a0Li<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Enhong\u00a0Chen,\u00a0Tie-Yan Liu<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\n

A New\u00a0High Quality\u00a0Trajectory Tiling Based Hybrid TTS In Real Time<\/strong><\/a><\/p>\n

Feng-Long Xie, Xin-Hui Li, Wen-Chao\u00a0Su, Li Lu,\u00a0Frank K. Soong<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Modeling 1: Fusion and Training for End-to-End ASR<\/p>\n

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition<\/strong><\/a><\/p>\n

Zhong Meng<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Sarangarajan Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Liang Lu<\/a>,\u00a0Xie Chen<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 1: Speech Separation<\/a><\/p>\n

Session Chair: Zhuo Chen<\/a><\/p>\n

Rethinking The Separation Layers\u00a0In\u00a0Speech Separation Networks<\/strong><\/a><\/p>\n

Yi Luo,\u00a0Zhuo Chen<\/a>, Cong Han, Chenda Li,\u00a0Tianyan Zhou<\/a>, Nima\u00a0Mesgarani<\/p>\n

13:00 \u2013 13:45 | Deep Learning Training Methods 3<\/a><\/p>\n

Session Chair: Jinyu Li<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Brain-Computer Interfaces<\/p>\n

Decoding Music Attention from \u201cEEG Headphones\u201d: A User-Friendly Auditory Brain-Computer Interface<\/strong><\/a><\/p>\n

Wenkang\u00a0An, Barbara Shinn-Cunningham,\u00a0Hannes Gamper<\/a>,\u00a0Dimitra Emmanouilidou<\/a>,\u00a0David Johnston<\/a>,\u00a0Mihai Jalobeanu<\/a>,\u00a0Edward Cutrell<\/a>,\u00a0Andrew Wilson<\/a>, Kuan-Jung Chiang,\u00a0Ivan Tashev<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation<\/a><\/p>\n

Session Chair: Takuya Yoshioka<\/a><\/p>\n

Dual-Path Modeling for Long Recording Speech Separation in Meetings<\/strong><\/a><\/p>\n

Chenda Li,\u00a0Zhuo Chen<\/a>, Yi Luo, Cong Han,\u00a0Tianyan Zhou<\/a>, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian<\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation<\/p>\n

Continuous Speech Separation with Conformer<\/strong><\/a><\/p>\n

Sanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Jian Wu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Chengyi Wang<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Ming Zhou<\/a><\/p>\n

14:00 \u2013 14:45 | Speech Enhancement 2: Speech Separation and Dereverberation<\/a><\/p>\n

Session Chair: Takuya Yoshioka<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speaker Recognition 1: Benchmark Evaluation<\/p>\n

Microsoft Speaker\u00a0Diarization\u00a0System for the\u00a0Voxceleb\u00a0Speaker Recognition Challenge 2020<\/strong><\/a><\/p>\n

Xiong Xiao<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Tianyan Zhou<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Sanyuan Chen<\/a>,\u00a0Yong Zhao<\/a>,\u00a0Gang Liu<\/a>,\u00a0Yu Wu<\/a>,\u00a0Jian Wu<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Dialogue Systems 2: Response Generation<\/p>\n

Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention<\/strong><\/a><\/p>\n

Shijie\u00a0Zhou, Wenge Rong,\u00a0Jianfei\u00a0Zhang,\u00a0Yanmeng\u00a0Wang,\u00a0Libin Shi<\/a>, Zhang Xiong<\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Recognition 4: Transformer Models 2<\/p>\n

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset<\/strong><\/a><\/p>\n

Xie Chen<\/a>,\u00a0Yu Wu<\/a>,\u00a0Zhenghao Wang<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation<\/a><\/p>\n

Session Chair: Hannes Gamper<\/a><\/p>\n

ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results<\/strong><\/a><\/p>\n

Kusha Sridhar,\u00a0Ross Cutler<\/a>,\u00a0Ando Saabas<\/a>,\u00a0Tanel Parnamaa,\u00a0Markus Loide<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Learning<\/a><\/p>\n

Session Chair: Zhong Meng<\/a><\/p>\n

Sequence-Level Self-Teaching Regularization<\/strong><\/a><\/p>\n

Eric Sun,\u00a0Liang Lu<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Yifan Gong<\/a><\/p>\n


\n

Wednesday, June 9<\/h2>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Understanding 1: End-to-end Speech Understanding 1<\/p>\n

Speech-Language Pre-Training for End-to-End Spoken Language Understanding<\/strong><\/a><\/p>\n

Yao Qian<\/a>, Ximo\u00a0Bian,\u00a0Yu Shi<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Leo Shen,\u00a0Zhen Xiao<\/a>,\u00a0Michael Zeng<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 4: Multi-Channel Source Separation<\/p>\n

DBnet:\u00a0Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation<\/strong><\/a><\/p>\n

Ali\u00a0Aroudi,\u00a0Sebastian Braun<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 4: Multi-channel Processing<\/p>\n

Don\u2019t Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer<\/strong><\/a><\/p>\n

Sanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Xiangzhan\u00a0Yu<\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Matrix Factorization and Applications<\/p>\n

Cold Start Revisited: A Deep Hybrid Recommender with Cold-Warm Item Harmonization<\/strong><\/a><\/p>\n

Oren Barkan,\u00a0Roy Hirsch<\/a>,\u00a0Ori Katz,\u00a0Avi Caciularu<\/a>,\u00a0Yoni Weill,\u00a0Noam Koenigstein<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Biological Image Analysis<\/p>\n

CMIM: Cross-Modal Information Maximization\u00a0For\u00a0Medical Imaging<\/strong><\/a><\/p>\n

Tristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di\u00a0Jorio, Margaux Luck,\u00a0Devon Hjelm<\/a>, Yoshua\u00a0Bengio<\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 8: Multilingual Speech Recognition<\/p>\n

Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts<\/strong><\/a><\/p>\n

Amit Das<\/a>,\u00a0Kshitiz Kumar<\/a>,\u00a0Jian Wu<\/a><\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\n

MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network<\/strong><\/a><\/p>\n

Yichong\u00a0Leng,\u00a0Xu Tan<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Frank K. Soong<\/a>, Xiang-Yang Li,\u00a0Tao Qin<\/a><\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\n

Crowdsourcing Approach for Subjective Evaluation of Echo Impairment<\/strong><\/a><\/p>\n

Ross Cutler<\/a>, Babak\u00a0Nadari,\u00a0Markus Loide<\/a>,\u00a0Sten Sootla<\/a>,\u00a0Ando Saabas<\/a><\/p>\n

16:30 \u2013 17:15 | Speech Recognition 9: Confidence Measures<\/a><\/p>\n

Session Chair: Yifan Gong<\/a><\/p>\n

16:30 \u2013 17:15 | Speech Recognition 10: Robustness to Human Speech Variability<\/a><\/p>\n

Session Chair: Yifan Gong<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Processing 2: General Topics<\/p>\n

Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors<\/strong><\/a><\/p>\n

Chandan K A Reddy<\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Style and Text Normalization<\/p>\n

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model<\/strong><\/a><\/p>\n

Junwei Liao,\u00a0Yu Shi<\/a>,\u00a0Ming Gong<\/a>,\u00a0Linjun Shou<\/a>,\u00a0Sefik Eskimez<\/a>,\u00a0Liyang Lu<\/a>, Hong Qu,\u00a0Michael Zeng<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Modeling, Analysis and Synthesis of Acoustic Environments 3: Acoustic Analysis<\/p>\n

Prediction of Object Geometry from Acoustic Scattering Using Convolutional Neural Networks<\/strong><\/a><\/p>\n

Ziqi Fan,\u00a0Vibhav Vineet<\/a>,\u00a0Chenshen\u00a0Lu, T.W. Wu, Kyla McMullen<\/p>\n


\n

Thursday, June 10<\/h2>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Recognition 11: Novel Approaches<\/p>\n

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR<\/strong><\/a><\/p>\n

Naoyuki Kanda<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Liang Lu<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Xiaofei Wang<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 5: Prosody & Style<\/p>\n

Speech Bert Embedding for Improving Prosody in Neural TTS<\/strong><\/a><\/p>\n

Liping Chen<\/a>,\u00a0Yan Deng<\/a>,\u00a0Xi Wang<\/a>,\u00a0Frank K. Soong<\/a>,\u00a0Lei He<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 6: Data Augmentation & Adaptation<\/p>\n

Adaspeech\u00a02: Adaptive Text to Speech with\u00a0Untranscribed\u00a0Data<\/strong><\/a><\/p>\n

Yuzi Yan,\u00a0Xu Tan<\/a>,\u00a0Bohan Li,\u00a0Tao Qin<\/a>,\u00a0Sheng Zhao<\/a>, Yuan Shen,\u00a0Tie-Yan Liu<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 5: DNS Challenge Task<\/a><\/p>\n

Session Chair: Chandan K A Reddy<\/a><\/p>\n

ICASSP 2021 Deep Noise Suppression Challenge<\/strong><\/a><\/p>\n

Chandan K A Reddy<\/a>,\u00a0Harishchandra Dubey<\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler<\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan<\/a><\/p>\n

14:00 \u2013 14:45 | Speech Enhancement 6: Multi-modal Processing<\/a><\/p>\n

Session Chair: Chandan K A Reddy<\/a><\/p>\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Graph Signal Processing<\/p>\n

Fast Hierarchy Preserving Graph Embedding via Subspace Constraints<\/strong><\/a><\/p>\n

Xu Chen,\u00a0Lun Du<\/a>,\u00a0Mengyuan\u00a0Chen, Yun Wang, QingQing Long,\u00a0Kunqing\u00a0Xie<\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 13: Acoustic Modeling 1<\/p>\n

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings<\/strong><\/a><\/p>\n

Xuankai\u00a0Chang,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Xiaofei Wang<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Takuya Yoshioka<\/a><\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 14: Acoustic Modeling 2<\/p>\n

Ensemble Combination between Different Time Segmentations<\/strong><\/a><\/p>\n

Jeremy Heng Meng Wong<\/a>,\u00a0Dimitrios Dimitriadis<\/a>,\u00a0Kenichi Kumatani<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0George Polovets<\/a>,\u00a0Partha Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a><\/p>\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Privacy and Information Security<\/p>\n

Detection Of Malicious DNS and Web Servers using Graph-Based Approaches<\/strong><\/a><\/p>\n

Jinyuan\u00a0Jia,\u00a0Zheng Dong<\/a>,\u00a0Jie Li<\/a>,\u00a0Jack W. Stokes<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Language Assessment<\/p>\n

Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples<\/strong><\/a><\/p>\n

Bin\u00a0Su,\u00a0Shaoguang Mao<\/a>,\u00a0Frank K. Soong<\/a>,\u00a0Yan Xia<\/a>,\u00a0Jonathan Tien<\/a>,\u00a0Zhiyong\u00a0Wu<\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Signal Enhancement and Restoration 1: Deep Learning<\/p>\n

Towards Efficient Models for Real-Time Deep Noise Suppression<\/strong><\/a><\/p>\n

Sebastian Braun<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Chandan K A Reddy<\/a>,\u00a0Ivan Tashev<\/a><\/p>\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Signal Enhancement and Restoration 3: Signal Enhancement<\/p>\n

Phoneme-Based Distribution Regularization for Speech Enhancement<\/strong><\/a><\/p>\n

Yajing\u00a0Liu,\u00a0Xiulian Peng<\/a>, Zhiwei Xiong,\u00a0Yan Lu<\/a><\/p>\n

16:30 \u2013 17:15 | Audio & Images<\/a><\/p>\n

Session Chair: Ivan Tashev<\/a><\/p>\n


\n

Friday, June 11<\/h2>\n

1:30\u00a0\u2013\u00a012:15\u00a0|\u00a0Speech Recognition 18: Low Resource ASR<\/p>\n

MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition<\/strong><\/a><\/p>\n

Linghui\u00a0Meng,\u00a0Jin\u00a0Xu,\u00a0Xu Tan<\/a>,\u00a0Jindong Wang<\/a>,\u00a0Tao Qin<\/a>, Bo Xu<\/p>\n

11:30\u00a0\u2013\u00a012:15\u00a0|\u00a0Speech Synthesis 7: General Topics<\/p>\n

Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling<\/strong><\/a><\/p>\n

Chen Zhang, Yi Ren,\u00a0Xu Tan<\/a>,\u00a0Jinglin\u00a0Liu,\u00a0Kejun\u00a0Zhang,\u00a0Tao Qin<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Tie-Yan Liu<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Enhancement 8: Echo Cancellation and Other Tasks<\/p>\n

Cascaded Time + Time-Frequency\u00a0Unet\u00a0For\u00a0Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps<\/strong><\/a><\/p>\n

Arun Asokan Nair,\u00a0Kazuhito Koishida<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speaker\u00a0Diarization<\/p>\n

Hidden Markov Model\u00a0Diarisation\u00a0with Speaker Location Information<\/strong><\/a><\/p>\n

Jeremy Heng\u00a0Meng Wong<\/a>,\u00a0Xiong Xiao<\/a>,\u00a0Yifan Gong<\/a><\/p>\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Detection and Classification of Acoustic Scenes and Events 5: Scenes<\/p>\n

Cross-Modal Spectrum Transformation Network for Acoustic Scene Classification<\/strong><\/a><\/p>\n

Yang Liu,\u00a0Alexandros Neophytou<\/a>,\u00a0Sunando Sengupta<\/a>,\u00a0Eric Sommerlade<\/a>Opens in a new tab<\/span><\/p>\n

ICASSP 2021 Acoustic Echo Cancellation Challenge<\/h2>\n

The ICASSP 2021 Acoustic Echo Cancellation Challenge (opens in new tab)<\/span><\/a> is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 17 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Acoustic Echo Cancellation Challenge.<\/p>\n

 <\/p>\n

1st place<\/h3>\n

Organization: Amazon
\nAuthors: Jean-Marc Valin, Srikanth Tenneti, Karim Helwani, Umut Isik, Arvindh Krishnaswamy
\nPaper:
Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet (opens in new tab)<\/span><\/a><\/p>\n


\n

2nd place<\/h3>\n

Organization: SoundConnect and Alibaba
\nAuthors: Ziteng Wang, Yueyue Na, Zhang Liu, Biao Tian, Qiang Fu
\nPaper:
Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge (opens in new tab)<\/span><\/a><\/p>\n


\n

3rd place<\/h3>\n

Organization: Carl von Ossietzky University Oldenburg
\nAuthors: Nils L. Westhausen, Bernd T. Meyer
\nPaper:
Acoustic echo cancellation with the dual-signal transformation LSTM network (opens in new tab)<\/span><\/a><\/p>\n

 <\/p>\n

ICASSP 2021 Deep Noise Suppression (DNS) Challenge<\/h2>\n

The ICASSP 2021 Deep Noise Suppression (DNS) Challenge (opens in new tab)<\/span><\/a> is intended to stimulate research in the area of noise suppression, which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 19 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Deep Noise Suppression Challenge.<\/p>\n

 <\/p>\n

1st place<\/h3>\n

Organization: Institute of Acoustics, Chinese Academy of Sciences
\nAuthors: Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li
\nPaper:
ICASSP 2021 DEEP NOISE SUPPRESSION CHALLENGE: DECOUPLING MAGNITUDE AND PHASE OPTIMIZATION WITH A TWO-STAGE DEEP NETWORK (opens in new tab)<\/span><\/a><\/p>\n


\n

2nd place<\/h3>\n

Organization: Sogou
\nAuthors: Jingdong Li, Dawei Luo, Yun Liu, Yuanyuan Zhu, Zhaoxia Li, Guohui Cui, Wenqi Tang, Wei Chen
\nPaper:
Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement (opens in new tab)<\/span><\/a><\/p>\n


\n

3rd place<\/h3>\n

Organization: Seol National University, Supertone
\nAuthors: Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee
\nPaper:
REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET (opens in new tab)<\/span><\/a>Opens in a new tab<\/span><\/p>\n","tab-content":[{"id":0,"name":"About","content":"Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) event<\/a>. See more details on our contributions below.\r\n\r\n \r\n

Session Chairs<\/h3>\r\nThe following Microsoft researchers will chair sessions at the conference.\r\n\r\nZhuo Chen<\/a>\r\nHannes Gamper<\/a>\r\nYifan Gong<\/a>\r\nJinyu Li<\/a>\r\nZhong Meng<\/a>\r\nChandan K A Reddy<\/a>\r\nIvan Tashev<\/a>\r\nTakuya Yoshioka<\/a>"},{"id":1,"name":"Sessions","content":"All times are displayed in\u00a0Eastern Daylight Time (UTC -4)\r\n

Monday, June 7<\/h2>\r\n

10:00 \u2013 13:30 | Tutorial<\/p>\r\n

Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization<\/strong><\/p>\r\nPresenters: Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda<\/a>, Shinji Watanabe\r\n

18:00 \u2013 19:00<\/p>\r\n

Young Professionals Panel Discussion<\/strong><\/p>\r\nModerator: Subhro Das\r\nPanelists:\u00a0Sabrina Rashid, Vanessa Testoni,\u00a0Hamid\u00a0Palangi<\/a>\r\n\r\n


\r\n\r\n

Tuesday, June 8<\/h2>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\r\n

Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search<\/strong><\/a><\/p>\r\nRenqian\u00a0Luo,\u00a0Xu Tan<\/a>,\u00a0Rui Wang<\/a>,\u00a0Tao Qin<\/a>,\u00a0Jinzhu\u00a0Li<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Enhong\u00a0Chen,\u00a0Tie-Yan Liu<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\r\n

A New\u00a0High Quality\u00a0Trajectory Tiling Based Hybrid TTS In Real Time<\/strong><\/a><\/p>\r\nFeng-Long Xie, Xin-Hui Li, Wen-Chao\u00a0Su, Li Lu,\u00a0Frank K. Soong<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Modeling 1: Fusion and Training for End-to-End ASR<\/p>\r\n

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition<\/strong><\/a><\/p>\r\nZhong Meng<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Sarangarajan Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Liang Lu<\/a>,\u00a0Xie Chen<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 1: Speech Separation<\/a><\/p>\r\n

Session Chair: Zhuo Chen<\/a><\/p>\r\n

Rethinking The Separation Layers\u00a0In\u00a0Speech Separation Networks<\/strong><\/a><\/p>\r\nYi Luo,\u00a0Zhuo Chen<\/a>, Cong Han, Chenda Li,\u00a0Tianyan Zhou<\/a>, Nima\u00a0Mesgarani\r\n

13:00 \u2013 13:45 | Deep Learning Training Methods 3<\/a><\/p>\r\nSession Chair: Jinyu Li<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Brain-Computer Interfaces<\/p>\r\n

Decoding Music Attention from \u201cEEG Headphones\u201d: A User-Friendly Auditory Brain-Computer Interface<\/strong><\/a><\/p>\r\nWenkang\u00a0An, Barbara Shinn-Cunningham,\u00a0Hannes Gamper<\/a>,\u00a0Dimitra Emmanouilidou<\/a>,\u00a0David Johnston<\/a>,\u00a0Mihai Jalobeanu<\/a>,\u00a0Edward Cutrell<\/a>,\u00a0Andrew Wilson<\/a>, Kuan-Jung Chiang,\u00a0Ivan Tashev<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation<\/a><\/p>\r\nSession Chair: Takuya Yoshioka<\/a>\r\n

Dual-Path Modeling for Long Recording Speech Separation in Meetings<\/strong><\/a><\/p>\r\nChenda Li,\u00a0Zhuo Chen<\/a>, Yi Luo, Cong Han,\u00a0Tianyan Zhou<\/a>, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation<\/p>\r\n

Continuous Speech Separation with Conformer<\/strong><\/a><\/p>\r\nSanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Jian Wu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Chengyi Wang<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Ming Zhou<\/a>\r\n

14:00 \u2013 14:45 | Speech Enhancement 2: Speech Separation and Dereverberation<\/a><\/p>\r\nSession Chair: Takuya Yoshioka<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speaker Recognition 1: Benchmark Evaluation<\/p>\r\n

Microsoft Speaker\u00a0Diarization\u00a0System for the\u00a0Voxceleb\u00a0Speaker Recognition Challenge 2020<\/strong><\/a><\/p>\r\nXiong Xiao<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Tianyan Zhou<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Sanyuan Chen<\/a>,\u00a0Yong Zhao<\/a>,\u00a0Gang Liu<\/a>,\u00a0Yu Wu<\/a>,\u00a0Jian Wu<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Dialogue Systems 2: Response Generation<\/p>\r\n

Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention<\/strong><\/a><\/p>\r\nShijie\u00a0Zhou, Wenge Rong,\u00a0Jianfei\u00a0Zhang,\u00a0Yanmeng\u00a0Wang,\u00a0Libin Shi<\/a>, Zhang Xiong\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Recognition 4: Transformer Models 2<\/p>\r\n

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset<\/strong><\/a><\/p>\r\nXie Chen<\/a>,\u00a0Yu Wu<\/a>,\u00a0Zhenghao Wang<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation<\/a><\/p>\r\n

Session Chair: Hannes Gamper<\/a><\/p>\r\n

ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results<\/strong><\/a><\/p>\r\nKusha Sridhar,\u00a0Ross Cutler<\/a>,\u00a0Ando Saabas<\/a>,\u00a0Tanel Parnamaa,\u00a0Markus Loide<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Learning<\/a><\/p>\r\n

Session Chair: Zhong Meng<\/a><\/p>\r\n

Sequence-Level Self-Teaching Regularization<\/strong><\/a><\/p>\r\nEric Sun,\u00a0Liang Lu<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Yifan Gong<\/a>\r\n\r\n


\r\n\r\n

Wednesday, June 9<\/h2>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Understanding 1: End-to-end Speech Understanding 1<\/p>\r\n

Speech-Language Pre-Training for End-to-End Spoken Language Understanding<\/strong><\/a><\/p>\r\nYao Qian<\/a>, Ximo\u00a0Bian,\u00a0Yu Shi<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Leo Shen,\u00a0Zhen Xiao<\/a>,\u00a0Michael Zeng<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 4: Multi-Channel Source Separation<\/p>\r\n

DBnet:\u00a0Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation<\/strong><\/a><\/p>\r\nAli\u00a0Aroudi,\u00a0Sebastian Braun<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 4: Multi-channel Processing<\/p>\r\n

Don\u2019t Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer<\/strong><\/a><\/p>\r\nSanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Xiangzhan\u00a0Yu\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Matrix Factorization and Applications<\/p>\r\n

Cold Start Revisited: A Deep Hybrid Recommender with Cold-Warm Item Harmonization<\/strong><\/a><\/p>\r\nOren Barkan,\u00a0Roy Hirsch<\/a>,\u00a0Ori Katz,\u00a0Avi Caciularu<\/a>,\u00a0Yoni Weill,\u00a0Noam Koenigstein<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Biological Image Analysis<\/p>\r\n

CMIM: Cross-Modal Information Maximization\u00a0For\u00a0Medical Imaging<\/strong><\/a><\/p>\r\nTristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di\u00a0Jorio, Margaux Luck,\u00a0Devon Hjelm<\/a>, Yoshua\u00a0Bengio\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 8: Multilingual Speech Recognition<\/p>\r\n

Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts<\/strong><\/a><\/p>\r\nAmit Das<\/a>,\u00a0Kshitiz Kumar<\/a>,\u00a0Jian Wu<\/a>\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\r\n

MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network<\/strong><\/a><\/p>\r\nYichong\u00a0Leng,\u00a0Xu Tan<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Frank K. Soong<\/a>, Xiang-Yang Li,\u00a0Tao Qin<\/a>\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\r\n

Crowdsourcing Approach for Subjective Evaluation of Echo Impairment<\/strong><\/a><\/p>\r\nRoss Cutler<\/a>, Babak\u00a0Nadari,\u00a0Markus Loide<\/a>,\u00a0Sten Sootla<\/a>,\u00a0Ando Saabas<\/a>\r\n

16:30 \u2013 17:15 | Speech Recognition 9: Confidence Measures<\/a><\/p>\r\nSession Chair: Yifan Gong<\/a>\r\n

16:30 \u2013 17:15 | Speech Recognition 10: Robustness to Human Speech Variability<\/a><\/p>\r\nSession Chair: Yifan Gong<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Processing 2: General Topics<\/p>\r\n

Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors<\/strong><\/a><\/p>\r\nChandan K A Reddy<\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Style and Text Normalization<\/p>\r\n

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model<\/strong><\/a><\/p>\r\nJunwei Liao,\u00a0Yu Shi<\/a>,\u00a0Ming Gong<\/a>,\u00a0Linjun Shou<\/a>,\u00a0Sefik Eskimez<\/a>,\u00a0Liyang Lu<\/a>, Hong Qu,\u00a0Michael Zeng<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Modeling, Analysis and Synthesis of Acoustic Environments 3: Acoustic Analysis<\/p>\r\n

Prediction of Object Geometry from Acoustic Scattering Using Convolutional Neural Networks<\/strong><\/a><\/p>\r\nZiqi Fan,\u00a0Vibhav Vineet<\/a>,\u00a0Chenshen\u00a0Lu, T.W. Wu, Kyla McMullen\r\n\r\n


\r\n\r\n

Thursday, June 10<\/h2>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Recognition 11: Novel Approaches<\/p>\r\n

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR<\/strong><\/a><\/p>\r\nNaoyuki Kanda<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Liang Lu<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Xiaofei Wang<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 5: Prosody & Style<\/p>\r\n

Speech Bert Embedding for Improving Prosody in Neural TTS<\/strong><\/a><\/p>\r\nLiping Chen<\/a>,\u00a0Yan Deng<\/a>,\u00a0Xi Wang<\/a>,\u00a0Frank K. Soong<\/a>,\u00a0Lei He<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 6: Data Augmentation & Adaptation<\/p>\r\n

Adaspeech\u00a02: Adaptive Text to Speech with\u00a0Untranscribed\u00a0Data<\/strong><\/a><\/p>\r\nYuzi Yan,\u00a0Xu Tan<\/a>,\u00a0Bohan Li,\u00a0Tao Qin<\/a>,\u00a0Sheng Zhao<\/a>, Yuan Shen,\u00a0Tie-Yan Liu<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 5: DNS Challenge Task<\/a><\/p>\r\n

Session Chair: Chandan K A Reddy<\/a><\/p>\r\n

ICASSP 2021 Deep Noise Suppression Challenge<\/strong><\/a><\/p>\r\nChandan K A Reddy<\/a>,\u00a0Harishchandra Dubey<\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler<\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan<\/a>\r\n

14:00 \u2013 14:45 | Speech Enhancement 6: Multi-modal Processing<\/a><\/p>\r\nSession Chair: Chandan K A Reddy<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Graph Signal Processing<\/p>\r\n

Fast Hierarchy Preserving Graph Embedding via Subspace Constraints<\/strong><\/a><\/p>\r\nXu Chen,\u00a0Lun Du<\/a>,\u00a0Mengyuan\u00a0Chen, Yun Wang, QingQing Long,\u00a0Kunqing\u00a0Xie\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 13: Acoustic Modeling 1<\/p>\r\n

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings<\/strong><\/a><\/p>\r\nXuankai\u00a0Chang,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Xiaofei Wang<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Takuya Yoshioka<\/a>\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 14: Acoustic Modeling 2<\/p>\r\n

Ensemble Combination between Different Time Segmentations<\/strong><\/a><\/p>\r\nJeremy Heng Meng Wong<\/a>,\u00a0Dimitrios Dimitriadis<\/a>,\u00a0Kenichi Kumatani<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0George Polovets<\/a>,\u00a0Partha Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a>\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Privacy and Information Security<\/p>\r\n

Detection Of Malicious DNS and Web Servers using Graph-Based Approaches<\/strong><\/a><\/p>\r\nJinyuan\u00a0Jia,\u00a0Zheng Dong<\/a>,\u00a0Jie Li<\/a>,\u00a0Jack W. Stokes<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Language Assessment<\/p>\r\n

Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples<\/strong><\/a><\/p>\r\nBin\u00a0Su,\u00a0Shaoguang Mao<\/a>,\u00a0Frank K. Soong<\/a>,\u00a0Yan Xia<\/a>,\u00a0Jonathan Tien<\/a>,\u00a0Zhiyong\u00a0Wu\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Signal Enhancement and Restoration 1: Deep Learning<\/p>\r\n

Towards Efficient Models for Real-Time Deep Noise Suppression<\/strong><\/a><\/p>\r\nSebastian Braun<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Chandan K A Reddy<\/a>,\u00a0Ivan Tashev<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Signal Enhancement and Restoration 3: Signal Enhancement<\/p>\r\n

Phoneme-Based Distribution Regularization for Speech Enhancement<\/strong><\/a><\/p>\r\nYajing\u00a0Liu,\u00a0Xiulian Peng<\/a>, Zhiwei Xiong,\u00a0Yan Lu<\/a>\r\n

16:30 \u2013 17:15 | Audio & Images<\/a><\/p>\r\nSession Chair: Ivan Tashev<\/a>\r\n\r\n


\r\n\r\n

Friday, June 11<\/h2>\r\n

1:30\u00a0\u2013\u00a012:15\u00a0|\u00a0Speech Recognition 18: Low Resource ASR<\/p>\r\n

MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition<\/strong><\/a><\/p>\r\nLinghui\u00a0Meng,\u00a0Jin\u00a0Xu,\u00a0Xu Tan<\/a>,\u00a0Jindong Wang<\/a>,\u00a0Tao Qin<\/a>, Bo Xu\r\n

11:30\u00a0\u2013\u00a012:15\u00a0|\u00a0Speech Synthesis 7: General Topics<\/p>\r\n

Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling<\/strong><\/a><\/p>\r\nChen Zhang, Yi Ren,\u00a0Xu Tan<\/a>,\u00a0Jinglin\u00a0Liu,\u00a0Kejun\u00a0Zhang,\u00a0Tao Qin<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Tie-Yan Liu<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Enhancement 8: Echo Cancellation and Other Tasks<\/p>\r\n

Cascaded Time + Time-Frequency\u00a0Unet\u00a0For\u00a0Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps<\/strong><\/a><\/p>\r\nArun Asokan Nair,\u00a0Kazuhito Koishida<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speaker\u00a0Diarization<\/p>\r\n

Hidden Markov Model\u00a0Diarisation\u00a0with Speaker Location Information<\/strong><\/a><\/p>\r\nJeremy Heng\u00a0Meng Wong<\/a>,\u00a0Xiong Xiao<\/a>,\u00a0Yifan Gong<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Detection and Classification of Acoustic Scenes and Events 5: Scenes<\/p>\r\n

Cross-Modal Spectrum Transformation Network for Acoustic Scene Classification<\/strong><\/a><\/p>\r\nYang Liu,\u00a0Alexandros Neophytou<\/a>,\u00a0Sunando Sengupta<\/a>,\u00a0Eric Sommerlade<\/a>"},{"id":2,"name":"Grand Challenges","content":"

ICASSP 2021 Acoustic Echo Cancellation Challenge<\/h2>\r\nThe ICASSP 2021 Acoustic Echo Cancellation Challenge<\/a> is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 17 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Acoustic Echo Cancellation Challenge.\r\n\r\n \r\n

1st place<\/h3>\r\nOrganization: Amazon\r\nAuthors: Jean-Marc Valin, Srikanth Tenneti, Karim Helwani, Umut Isik, Arvindh Krishnaswamy\r\nPaper: Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet<\/a>\r\n\r\n
\r\n\r\n

2nd place<\/h3>\r\nOrganization: SoundConnect and Alibaba\r\nAuthors: Ziteng Wang, Yueyue Na, Zhang Liu, Biao Tian, Qiang Fu\r\nPaper: Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge<\/a>\r\n\r\n
\r\n\r\n

3rd place<\/h3>\r\nOrganization: Carl von Ossietzky University Oldenburg\r\nAuthors: Nils L. Westhausen, Bernd T. Meyer\r\nPaper: Acoustic echo cancellation with the dual-signal transformation LSTM network<\/a>\r\n\r\n \r\n

ICASSP 2021 Deep Noise Suppression (DNS) Challenge<\/h2>\r\nThe ICASSP 2021 Deep Noise Suppression (DNS) Challenge<\/a> is intended to stimulate research in the area of noise suppression, which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 19 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Deep Noise Suppression Challenge.\r\n\r\n \r\n

1st place<\/h3>\r\nOrganization: Institute of Acoustics, Chinese Academy of Sciences\r\nAuthors: Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li\r\nPaper: ICASSP 2021 DEEP NOISE SUPPRESSION CHALLENGE: DECOUPLING MAGNITUDE AND PHASE OPTIMIZATION WITH A TWO-STAGE DEEP NETWORK<\/a>\r\n\r\n
\r\n\r\n

2nd place<\/h3>\r\nOrganization: Sogou\r\nAuthors: Jingdong Li, Dawei Luo, Yun Liu, Yuanyuan Zhu, Zhaoxia Li, Guohui Cui, Wenqi Tang, Wei Chen\r\nPaper: Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement<\/a>\r\n\r\n
\r\n\r\n

3rd place<\/h3>\r\nOrganization: Seol National University, Supertone\r\nAuthors: Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee\r\nPaper: REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET<\/a>"}],"msr_startdate":"2021-06-06","msr_enddate":"2021-06-11","msr_event_time":"","msr_location":"Virtual","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"June 6, 2021","msr_register_text":"Watch now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":null,"event_excerpt":"Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) event.","msr_research_lab":[],"related-researchers":[],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-opportunities":[],"related-publications":[752071,754324,754333,763438,810712,810724,815227],"related-videos":[],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/748330","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/748330\/revisions"}],"predecessor-version":[{"id":1146872,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/748330\/revisions\/1146872"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=748330"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=748330"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=748330"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=748330"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=748330"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=748330"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=748330"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=748330"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=748330"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}