{"id":748330,"date":"2021-05-28T10:21:26","date_gmt":"2021-05-28T17:21:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&p=748330"},"modified":"2025-08-06T11:51:22","modified_gmt":"2025-08-06T18:51:22","slug":"icassp-2021","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/icassp-2021\/","title":{"rendered":"Microsoft at ICASSP 2021"},"content":{"rendered":"\n\n
Website:<\/strong> ICASSP 2021 (opens in new tab)<\/span><\/a>Opens in a new tab<\/span><\/p>\n Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) event (opens in new tab)<\/span><\/a>. See more details on our contributions below.<\/p>\n <\/p>\n The following Microsoft researchers will chair sessions at the conference.<\/p>\n Zhuo Chen<\/a> All times are displayed in\u00a0Eastern Daylight Time (UTC -4)<\/p>\n 10:00 \u2013 13:30 | Tutorial<\/p>\n Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization<\/strong><\/p>\n Presenters: Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda<\/a>, Shinji Watanabe<\/p>\n 18:00 \u2013 19:00<\/p>\n Young Professionals Panel Discussion<\/strong><\/p>\n Moderator: Subhro Das 13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\n Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Renqian\u00a0Luo,\u00a0Xu Tan<\/a>,\u00a0Rui Wang<\/a>,\u00a0Tao Qin<\/a>,\u00a0Jinzhu\u00a0Li (opens in new tab)<\/span><\/a>,\u00a0Sheng Zhao (opens in new tab)<\/span><\/a>,\u00a0Enhong\u00a0Chen,\u00a0Tie-Yan Liu<\/a><\/p>\n 13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\n A New\u00a0High Quality\u00a0Trajectory Tiling Based Hybrid TTS In Real Time<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Feng-Long Xie, Xin-Hui Li, Wen-Chao\u00a0Su, Li Lu,\u00a0Frank K. Soong<\/a><\/p>\n 13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Modeling 1: Fusion and Training for End-to-End ASR<\/p>\n Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition<\/strong><\/a><\/p>\n Zhong Meng<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur (opens in new tab)<\/span><\/a>,\u00a0Sarangarajan Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Liang Lu (opens in new tab)<\/span><\/a>,\u00a0Xie Chen<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a><\/p>\n 13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 1: Speech Separation (opens in new tab)<\/span><\/a><\/p>\n Session Chair: Zhuo Chen<\/a><\/p>\n Rethinking The Separation Layers\u00a0In\u00a0Speech Separation Networks<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Yi Luo,\u00a0Zhuo Chen<\/a>, Cong Han, Chenda Li,\u00a0Tianyan Zhou (opens in new tab)<\/span><\/a>, Nima\u00a0Mesgarani<\/p>\n 13:00 \u2013 13:45 | Deep Learning Training Methods 3 (opens in new tab)<\/span><\/a><\/p>\n Session Chair: Jinyu Li<\/a><\/p>\n 13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Brain-Computer Interfaces<\/p>\n Decoding Music Attention from \u201cEEG Headphones\u201d: A User-Friendly Auditory Brain-Computer Interface<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Wenkang\u00a0An, Barbara Shinn-Cunningham,\u00a0Hannes Gamper<\/a>,\u00a0Dimitra Emmanouilidou<\/a>,\u00a0David Johnston<\/a>,\u00a0Mihai Jalobeanu<\/a>,\u00a0Edward Cutrell<\/a>,\u00a0Andrew Wilson<\/a>, Kuan-Jung Chiang,\u00a0Ivan Tashev<\/a><\/p>\n 14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation (opens in new tab)<\/span><\/a><\/p>\n Session Chair: Takuya Yoshioka<\/a><\/p>\n Dual-Path Modeling for Long Recording Speech Separation in Meetings<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Chenda Li,\u00a0Zhuo Chen<\/a>, Yi Luo, Cong Han,\u00a0Tianyan Zhou (opens in new tab)<\/span><\/a>, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian<\/p>\n 14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation<\/p>\n Continuous Speech Separation with Conformer<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Sanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Jian Wu (opens in new tab)<\/span><\/a>,\u00a0Jinyu Li<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Chengyi Wang (opens in new tab)<\/span><\/a>,\u00a0Shujie Liu<\/a>,\u00a0Ming Zhou (opens in new tab)<\/span><\/a><\/p>\n 14:00 \u2013 14:45 | Speech Enhancement 2: Speech Separation and Dereverberation (opens in new tab)<\/span><\/a><\/p>\n Session Chair: Takuya Yoshioka<\/a><\/p>\n 14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speaker Recognition 1: Benchmark Evaluation<\/p>\n Microsoft Speaker\u00a0Diarization\u00a0System for the\u00a0Voxceleb\u00a0Speaker Recognition Challenge 2020<\/strong><\/a><\/p>\n Xiong Xiao (opens in new tab)<\/span><\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Tianyan Zhou (opens in new tab)<\/span><\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Sanyuan Chen (opens in new tab)<\/span><\/a>,\u00a0Yong Zhao (opens in new tab)<\/span><\/a>,\u00a0Gang Liu (opens in new tab)<\/span><\/a>,\u00a0Yu Wu<\/a>,\u00a0Jian Wu (opens in new tab)<\/span><\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a><\/p>\n 14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Dialogue Systems 2: Response Generation<\/p>\n Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Shijie\u00a0Zhou, Wenge Rong,\u00a0Jianfei\u00a0Zhang,\u00a0Yanmeng\u00a0Wang,\u00a0Libin Shi (opens in new tab)<\/span><\/a>, Zhang Xiong<\/p>\n 16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Recognition 4: Transformer Models 2<\/p>\n Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Xie Chen<\/a>,\u00a0Yu Wu<\/a>,\u00a0Zhenghao Wang (opens in new tab)<\/span><\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a><\/p>\n 16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation (opens in new tab)<\/span><\/a><\/p>\n Session Chair: Hannes Gamper<\/a><\/p>\n ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Kusha Sridhar,\u00a0Ross Cutler (opens in new tab)<\/span><\/a>,\u00a0Ando Saabas (opens in new tab)<\/span><\/a>,\u00a0Tanel Parnamaa,\u00a0Markus Loide (opens in new tab)<\/span><\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan (opens in new tab)<\/span><\/a><\/p>\n 16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Learning (opens in new tab)<\/span><\/a><\/p>\n Session Chair: Zhong Meng<\/a><\/p>\n Sequence-Level Self-Teaching Regularization<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Eric Sun,\u00a0Liang Lu (opens in new tab)<\/span><\/a>,\u00a0Zhong Meng<\/a>,\u00a0Yifan Gong<\/a><\/p>\n 13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Understanding 1: End-to-end Speech Understanding 1<\/p>\n Speech-Language Pre-Training for End-to-End Spoken Language Understanding<\/strong><\/a><\/p>\n Yao Qian<\/a>, Ximo\u00a0Bian,\u00a0Yu Shi<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Leo Shen,\u00a0Zhen Xiao (opens in new tab)<\/span><\/a>,\u00a0Michael Zeng<\/a><\/p>\n 13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 4: Multi-Channel Source Separation<\/p>\n DBnet:\u00a0Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Ali\u00a0Aroudi,\u00a0Sebastian Braun<\/a><\/p>\n 14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 4: Multi-channel Processing<\/p>\n Don\u2019t Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Sanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Xiangzhan\u00a0Yu<\/p>\n 14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Matrix Factorization and Applications<\/p>\n Cold Start Revisited: A Deep Hybrid Recommender with Cold-Warm Item Harmonization<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Oren Barkan,\u00a0Roy Hirsch (opens in new tab)<\/span><\/a>,\u00a0Ori Katz,\u00a0Avi Caciularu (opens in new tab)<\/span><\/a>,\u00a0Yoni Weill,\u00a0Noam Koenigstein (opens in new tab)<\/span><\/a><\/p>\n 14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Biological Image Analysis<\/p>\n CMIM: Cross-Modal Information Maximization\u00a0For\u00a0Medical Imaging<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Tristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di\u00a0Jorio, Margaux Luck,\u00a0Devon Hjelm<\/a>, Yoshua\u00a0Bengio<\/p>\n 15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 8: Multilingual Speech Recognition<\/p>\n Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Amit Das (opens in new tab)<\/span><\/a>,\u00a0Kshitiz Kumar (opens in new tab)<\/span><\/a>,\u00a0Jian Wu (opens in new tab)<\/span><\/a><\/p>\n 15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\n MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Yichong\u00a0Leng,\u00a0Xu Tan<\/a>,\u00a0Sheng Zhao (opens in new tab)<\/span><\/a>,\u00a0Frank K. Soong<\/a>, Xiang-Yang Li,\u00a0Tao Qin<\/a><\/p>\n 15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\n Crowdsourcing Approach for Subjective Evaluation of Echo Impairment<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Ross Cutler (opens in new tab)<\/span><\/a>, Babak\u00a0Nadari,\u00a0Markus Loide (opens in new tab)<\/span><\/a>,\u00a0Sten Sootla (opens in new tab)<\/span><\/a>,\u00a0Ando Saabas (opens in new tab)<\/span><\/a><\/p>\n 16:30 \u2013 17:15 | Speech Recognition 9: Confidence Measures (opens in new tab)<\/span><\/a><\/p>\n Session Chair: Yifan Gong<\/a><\/p>\n 16:30 \u2013 17:15 | Speech Recognition 10: Robustness to Human Speech Variability (opens in new tab)<\/span><\/a><\/p>\n Session Chair: Yifan Gong<\/a><\/p>\n 16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Processing 2: General Topics<\/p>\n Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors<\/strong> (opens in new tab)<\/span><\/a><\/p>\n Chandan K A Reddy<\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler (opens in new tab)<\/span><\/a><\/p>\n 16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Style and Text Normalization<\/p>\n Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model<\/strong><\/a><\/p>\n Junwei Liao,\u00a0Yu Shi<\/a>,\u00a0Ming Gong<\/a>,\u00a0Linjun Shou<\/a>,\u00a0Sefik Eskimez<\/a>,\u00a0Liyang Lu<\/a>, Hong Qu,\u00a0Michael Zeng<\/a><\/p>\n 16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Modeling, Analysis and Synthesis of Acoustic Environments 3: Acoustic Analysis<\/p>\nSession Chairs<\/h3>\n
\nHannes Gamper<\/a>
\nYifan Gong<\/a>
\nJinyu Li<\/a>
\nZhong Meng<\/a>
\nChandan K A Reddy<\/a>
\nIvan Tashev<\/a>
\nTakuya Yoshioka<\/a>Opens in a new tab<\/span><\/p>\nMonday, June 7<\/h2>\n
\nPanelists:\u00a0Sabrina Rashid, Vanessa Testoni,\u00a0Hamid\u00a0Palangi<\/a><\/p>\n
\nTuesday, June 8<\/h2>\n
\nWednesday, June 9<\/h2>\n