{"id":748330,"date":"2021-05-28T10:21:26","date_gmt":"2021-05-28T17:21:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&p=748330"},"modified":"2021-06-09T05:07:19","modified_gmt":"2021-06-09T12:07:19","slug":"icassp-2021","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/icassp-2021\/","title":{"rendered":"Microsoft at ICASSP 2021"},"content":{"rendered":"

Website:<\/strong> ICASSP 2021 (opens in new tab)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"

Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) event.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_startdate":"2021-06-06","msr_enddate":"2021-06-11","msr_location":"Virtual","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"","msr_hide_region":true,"msr_private_event":false,"footnotes":""},"research-area":[243062,13545],"msr-region":[256048],"msr-event-type":[197941],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-748330","msr-event","type-msr-event","status-publish","hentry","msr-research-area-audio-acoustics","msr-research-area-human-language-technologies","msr-region-global","msr-event-type-conferences","msr-locale-en_us"],"msr_about":"Website:<\/strong> ICASSP 2021<\/a>","tab-content":[{"id":0,"name":"About","content":"Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) event<\/a>. See more details on our contributions below.\r\n\r\n \r\n

Session Chairs<\/h3>\r\nThe following Microsoft researchers will chair sessions at the conference.\r\n\r\nZhuo Chen<\/a>\r\nHannes Gamper<\/a>\r\nYifan Gong<\/a>\r\nJinyu Li<\/a>\r\nZhong Meng<\/a>\r\nChandan K A Reddy<\/a>\r\nIvan Tashev<\/a>\r\nTakuya Yoshioka<\/a>"},{"id":1,"name":"Sessions","content":"All times are displayed in\u00a0Eastern Daylight Time (UTC -4)\r\n

Monday, June 7<\/h2>\r\n

10:00 \u2013 13:30 | Tutorial<\/p>\r\n

Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization<\/strong><\/p>\r\nPresenters: Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda<\/a>, Shinji Watanabe\r\n

18:00 \u2013 19:00<\/p>\r\n

Young Professionals Panel Discussion<\/strong><\/p>\r\nModerator: Subhro Das\r\nPanelists:\u00a0Sabrina Rashid, Vanessa Testoni,\u00a0Hamid\u00a0Palangi<\/a>\r\n\r\n


\r\n\r\n

Tuesday, June 8<\/h2>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\r\n

Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search<\/strong><\/a><\/p>\r\nRenqian\u00a0Luo,\u00a0Xu Tan<\/a>,\u00a0Rui Wang<\/a>,\u00a0Tao Qin<\/a>,\u00a0Jinzhu\u00a0Li<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Enhong\u00a0Chen,\u00a0Tie-Yan Liu<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 1: Architecture<\/p>\r\n

A New\u00a0High Quality\u00a0Trajectory Tiling Based Hybrid TTS In Real Time<\/strong><\/a><\/p>\r\nFeng-Long Xie, Xin-Hui Li, Wen-Chao\u00a0Su, Li Lu,\u00a0Frank K. Soong<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Modeling 1: Fusion and Training for End-to-End ASR<\/p>\r\n

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition<\/strong><\/a><\/p>\r\nZhong Meng<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Sarangarajan Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Liang Lu<\/a>,\u00a0Xie Chen<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 1: Speech Separation<\/a><\/p>\r\n

Session Chair: Zhuo Chen<\/a><\/p>\r\n

Rethinking The Separation Layers\u00a0In\u00a0Speech Separation Networks<\/strong><\/a><\/p>\r\nYi Luo,\u00a0Zhuo Chen<\/a>, Cong Han, Chenda Li,\u00a0Tianyan Zhou<\/a>, Nima\u00a0Mesgarani\r\n

13:00 \u2013 13:45 | Deep Learning Training Methods 3<\/a><\/p>\r\nSession Chair: Jinyu Li<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Brain-Computer Interfaces<\/p>\r\n

Decoding Music Attention from \u201cEEG Headphones\u201d: A User-Friendly Auditory Brain-Computer Interface<\/strong><\/a><\/p>\r\nWenkang\u00a0An, Barbara Shinn-Cunningham,\u00a0Hannes Gamper<\/a>,\u00a0Dimitra Emmanouilidou<\/a>,\u00a0David Johnston<\/a>,\u00a0Mihai Jalobeanu<\/a>,\u00a0Edward Cutrell<\/a>,\u00a0Andrew Wilson<\/a>, Kuan-Jung Chiang,\u00a0Ivan Tashev<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation<\/a><\/p>\r\nSession Chair: Takuya Yoshioka<\/a>\r\n

Dual-Path Modeling for Long Recording Speech Separation in Meetings<\/strong><\/a><\/p>\r\nChenda Li,\u00a0Zhuo Chen<\/a>, Yi Luo, Cong Han,\u00a0Tianyan Zhou<\/a>, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 1: Speech Separation<\/p>\r\n

Continuous Speech Separation with Conformer<\/strong><\/a><\/p>\r\nSanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Jian Wu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Chengyi Wang<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Ming Zhou<\/a>\r\n

14:00 \u2013 14:45 | Speech Enhancement 2: Speech Separation and Dereverberation<\/a><\/p>\r\nSession Chair: Takuya Yoshioka<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speaker Recognition 1: Benchmark Evaluation<\/p>\r\n

Microsoft Speaker\u00a0Diarization\u00a0System for the\u00a0Voxceleb\u00a0Speaker Recognition Challenge 2020<\/strong><\/a><\/p>\r\nXiong Xiao<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Tianyan Zhou<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Sanyuan Chen<\/a>,\u00a0Yong Zhao<\/a>,\u00a0Gang Liu<\/a>,\u00a0Yu Wu<\/a>,\u00a0Jian Wu<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Dialogue Systems 2: Response Generation<\/p>\r\n

Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention<\/strong><\/a><\/p>\r\nShijie\u00a0Zhou, Wenge Rong,\u00a0Jianfei\u00a0Zhang,\u00a0Yanmeng\u00a0Wang,\u00a0Libin Shi<\/a>, Zhang Xiong\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Recognition 4: Transformer Models 2<\/p>\r\n

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset<\/strong><\/a><\/p>\r\nXie Chen<\/a>,\u00a0Yu Wu<\/a>,\u00a0Zhenghao Wang<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation<\/a><\/p>\r\n

Session Chair: Hannes Gamper<\/a><\/p>\r\n

ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results<\/strong><\/a><\/p>\r\nKusha Sridhar,\u00a0Ross Cutler<\/a>,\u00a0Ando Saabas<\/a>,\u00a0Tanel Parnamaa,\u00a0Markus Loide<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Learning<\/a><\/p>\r\n

Session Chair: Zhong Meng<\/a><\/p>\r\n

Sequence-Level Self-Teaching Regularization<\/strong><\/a><\/p>\r\nEric Sun,\u00a0Liang Lu<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Yifan Gong<\/a>\r\n\r\n


\r\n\r\n

Wednesday, June 9<\/h2>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Language Understanding 1: End-to-end Speech Understanding 1<\/p>\r\n

Speech-Language Pre-Training for End-to-End Spoken Language Understanding<\/strong><\/a><\/p>\r\nYao Qian<\/a>, Ximo\u00a0Bian,\u00a0Yu Shi<\/a>,\u00a0Naoyuki Kanda<\/a>,\u00a0Leo Shen,\u00a0Zhen Xiao<\/a>,\u00a0Michael Zeng<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Audio and Speech Source Separation 4: Multi-Channel Source Separation<\/p>\r\n

DBnet:\u00a0Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation<\/strong><\/a><\/p>\r\nAli\u00a0Aroudi,\u00a0Sebastian Braun<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 4: Multi-channel Processing<\/p>\r\n

Don\u2019t Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer<\/strong><\/a><\/p>\r\nSanyuan\u00a0Chen,\u00a0Yu Wu<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a>,\u00a0Shujie Liu<\/a>,\u00a0Jinyu Li<\/a>,\u00a0Xiangzhan\u00a0Yu\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Matrix Factorization and Applications<\/p>\r\n

Cold Start Revisited: A Deep Hybrid Recommender with Cold-Warm Item Harmonization<\/strong><\/a><\/p>\r\nOren Barkan,\u00a0Roy Hirsch<\/a>,\u00a0Ori Katz,\u00a0Avi Caciularu<\/a>,\u00a0Yoni Weill,\u00a0Noam Koenigstein<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Biological Image Analysis<\/p>\r\n

CMIM: Cross-Modal Information Maximization\u00a0For\u00a0Medical Imaging<\/strong><\/a><\/p>\r\nTristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di\u00a0Jorio, Margaux Luck,\u00a0Devon Hjelm<\/a>, Yoshua\u00a0Bengio\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 8: Multilingual Speech Recognition<\/p>\r\n

Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts<\/strong><\/a><\/p>\r\nAmit Das<\/a>,\u00a0Kshitiz Kumar<\/a>,\u00a0Jian Wu<\/a>\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\r\n

MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network<\/strong><\/a><\/p>\r\nYichong\u00a0Leng,\u00a0Xu Tan<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Frank K. Soong<\/a>, Xiang-Yang Li,\u00a0Tao Qin<\/a>\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Quality and Intelligibility Measures<\/p>\r\n

Crowdsourcing Approach for Subjective Evaluation of Echo Impairment<\/strong><\/a><\/p>\r\nRoss Cutler<\/a>, Babak\u00a0Nadari,\u00a0Markus Loide<\/a>,\u00a0Sten Sootla<\/a>,\u00a0Ando Saabas<\/a>\r\n

16:30 \u2013 17:15 | Speech Recognition 9: Confidence Measures<\/a><\/p>\r\nSession Chair: Yifan Gong<\/a>\r\n

16:30 \u2013 17:15 | Speech Recognition 10: Robustness to Human Speech Variability<\/a><\/p>\r\nSession Chair: Yifan Gong<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Speech Processing 2: General Topics<\/p>\r\n

Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors<\/strong><\/a><\/p>\r\nChandan K A Reddy<\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Style and Text Normalization<\/p>\r\n

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model<\/strong><\/a><\/p>\r\nJunwei Liao,\u00a0Yu Shi<\/a>,\u00a0Ming Gong<\/a>,\u00a0Linjun Shou<\/a>,\u00a0Sefik Eskimez<\/a>,\u00a0Liyang Lu<\/a>, Hong Qu,\u00a0Michael Zeng<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Modeling, Analysis and Synthesis of Acoustic Environments 3: Acoustic Analysis<\/p>\r\n

Prediction of Object Geometry from Acoustic Scattering Using Convolutional Neural Networks<\/strong><\/a><\/p>\r\nZiqi Fan,\u00a0Vibhav Vineet<\/a>,\u00a0Chenshen\u00a0Lu, T.W. Wu, Kyla McMullen\r\n\r\n


\r\n\r\n

Thursday, June 10<\/h2>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Recognition 11: Novel Approaches<\/p>\r\n

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR<\/strong><\/a><\/p>\r\nNaoyuki Kanda<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Liang Lu<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Xiaofei Wang<\/a>,\u00a0Zhuo Chen<\/a>,\u00a0Takuya Yoshioka<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 5: Prosody & Style<\/p>\r\n

Speech Bert Embedding for Improving Prosody in Neural TTS<\/strong><\/a><\/p>\r\nLiping Chen<\/a>,\u00a0Yan Deng<\/a>,\u00a0Xi Wang<\/a>,\u00a0Frank K. Soong<\/a>,\u00a0Lei He<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Synthesis 6: Data Augmentation & Adaptation<\/p>\r\n

Adaspeech\u00a02: Adaptive Text to Speech with\u00a0Untranscribed\u00a0Data<\/strong><\/a><\/p>\r\nYuzi Yan,\u00a0Xu Tan<\/a>,\u00a0Bohan Li,\u00a0Tao Qin<\/a>,\u00a0Sheng Zhao<\/a>, Yuan Shen,\u00a0Tie-Yan Liu<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Speech Enhancement 5: DNS Challenge Task<\/a><\/p>\r\n

Session Chair: Chandan K A Reddy<\/a><\/p>\r\n

ICASSP 2021 Deep Noise Suppression Challenge<\/strong><\/a><\/p>\r\nChandan K A Reddy<\/a>,\u00a0Harishchandra Dubey<\/a>,\u00a0Vishak Gopal<\/a>,\u00a0Ross Cutler<\/a>,\u00a0Sebastian Braun<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Robert Aichner<\/a>,\u00a0Sriram Srinivasan<\/a>\r\n

14:00 \u2013 14:45 | Speech Enhancement 6: Multi-modal Processing<\/a><\/p>\r\nSession Chair: Chandan K A Reddy<\/a>\r\n

14:00\u00a0\u2013\u00a014:45\u00a0|\u00a0Graph Signal Processing<\/p>\r\n

Fast Hierarchy Preserving Graph Embedding via Subspace Constraints<\/strong><\/a><\/p>\r\nXu Chen,\u00a0Lun Du<\/a>,\u00a0Mengyuan\u00a0Chen, Yun Wang, QingQing Long,\u00a0Kunqing\u00a0Xie\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 13: Acoustic Modeling 1<\/p>\r\n

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings<\/strong><\/a><\/p>\r\nXuankai\u00a0Chang,\u00a0Naoyuki Kanda<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0Xiaofei Wang<\/a>,\u00a0Zhong Meng<\/a>,\u00a0Takuya Yoshioka<\/a>\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Speech Recognition 14: Acoustic Modeling 2<\/p>\r\n

Ensemble Combination between Different Time Segmentations<\/strong><\/a><\/p>\r\nJeremy Heng Meng Wong<\/a>,\u00a0Dimitrios Dimitriadis<\/a>,\u00a0Kenichi Kumatani<\/a>,\u00a0Yashesh Gaur<\/a>,\u00a0George Polovets<\/a>,\u00a0Partha Parthasarathy<\/a>,\u00a0Eric Sun,\u00a0Jinyu Li<\/a>,\u00a0Yifan Gong<\/a>\r\n

15:30\u00a0\u2013\u00a016:15\u00a0|\u00a0Privacy and Information Security<\/p>\r\n

Detection Of Malicious DNS and Web Servers using Graph-Based Approaches<\/strong><\/a><\/p>\r\nJinyuan\u00a0Jia,\u00a0Zheng Dong<\/a>,\u00a0Jie Li<\/a>,\u00a0Jack W. Stokes<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Language Assessment<\/p>\r\n

Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples<\/strong><\/a><\/p>\r\nBin\u00a0Su,\u00a0Shaoguang Mao<\/a>,\u00a0Frank K. Soong<\/a>,\u00a0Yan Xia<\/a>,\u00a0Jonathan Tien<\/a>,\u00a0Zhiyong\u00a0Wu\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Signal Enhancement and Restoration 1: Deep Learning<\/p>\r\n

Towards Efficient Models for Real-Time Deep Noise Suppression<\/strong><\/a><\/p>\r\nSebastian Braun<\/a>,\u00a0Hannes Gamper<\/a>,\u00a0Chandan K A Reddy<\/a>,\u00a0Ivan Tashev<\/a>\r\n

16:30\u00a0\u2013\u00a017:15\u00a0|\u00a0Signal Enhancement and Restoration 3: Signal Enhancement<\/p>\r\n

Phoneme-Based Distribution Regularization for Speech Enhancement<\/strong><\/a><\/p>\r\nYajing\u00a0Liu,\u00a0Xiulian Peng<\/a>, Zhiwei Xiong,\u00a0Yan Lu<\/a>\r\n

16:30 \u2013 17:15 | Audio & Images<\/a><\/p>\r\nSession Chair: Ivan Tashev<\/a>\r\n\r\n


\r\n\r\n

Friday, June 11<\/h2>\r\n

1:30\u00a0\u2013\u00a012:15\u00a0|\u00a0Speech Recognition 18: Low Resource ASR<\/p>\r\n

MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition<\/strong><\/a><\/p>\r\nLinghui\u00a0Meng,\u00a0Jin\u00a0Xu,\u00a0Xu Tan<\/a>,\u00a0Jindong Wang<\/a>,\u00a0Tao Qin<\/a>, Bo Xu\r\n

11:30\u00a0\u2013\u00a012:15\u00a0|\u00a0Speech Synthesis 7: General Topics<\/p>\r\n

Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling<\/strong><\/a><\/p>\r\nChen Zhang, Yi Ren,\u00a0Xu Tan<\/a>,\u00a0Jinglin\u00a0Liu,\u00a0Kejun\u00a0Zhang,\u00a0Tao Qin<\/a>,\u00a0Sheng Zhao<\/a>,\u00a0Tie-Yan Liu<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speech Enhancement 8: Echo Cancellation and Other Tasks<\/p>\r\n

Cascaded Time + Time-Frequency\u00a0Unet\u00a0For\u00a0Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps<\/strong><\/a><\/p>\r\nArun Asokan Nair,\u00a0Kazuhito Koishida<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Speaker\u00a0Diarization<\/p>\r\n

Hidden Markov Model\u00a0Diarisation\u00a0with Speaker Location Information<\/strong><\/a><\/p>\r\nJeremy Heng\u00a0Meng Wong<\/a>,\u00a0Xiong Xiao<\/a>,\u00a0Yifan Gong<\/a>\r\n

13:00\u00a0\u2013\u00a013:45\u00a0|\u00a0Detection and Classification of Acoustic Scenes and Events 5: Scenes<\/p>\r\n

Cross-Modal Spectrum Transformation Network for Acoustic Scene Classification<\/strong><\/a><\/p>\r\nYang Liu,\u00a0Alexandros Neophytou<\/a>,\u00a0Sunando Sengupta<\/a>,\u00a0Eric Sommerlade<\/a>"},{"id":2,"name":"Grand Challenges","content":"

ICASSP 2021 Acoustic Echo Cancellation Challenge<\/h2>\r\nThe ICASSP 2021 Acoustic Echo Cancellation Challenge<\/a> is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 17 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Acoustic Echo Cancellation Challenge.\r\n\r\n \r\n

1st place<\/h3>\r\nOrganization: Amazon\r\nAuthors: Jean-Marc Valin, Srikanth Tenneti, Karim Helwani, Umut Isik, Arvindh Krishnaswamy\r\nPaper: Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet<\/a>\r\n\r\n
\r\n\r\n

2nd place<\/h3>\r\nOrganization: SoundConnect and Alibaba\r\nAuthors: Ziteng Wang, Yueyue Na, Zhang Liu, Biao Tian, Qiang Fu\r\nPaper: Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge<\/a>\r\n\r\n
\r\n\r\n

3rd place<\/h3>\r\nOrganization: Carl von Ossietzky University Oldenburg\r\nAuthors: Nils L. Westhausen, Bernd T. Meyer\r\nPaper: Acoustic echo cancellation with the dual-signal transformation LSTM network<\/a>\r\n\r\n \r\n

ICASSP 2021 Deep Noise Suppression (DNS) Challenge<\/h2>\r\nThe ICASSP 2021 Deep Noise Suppression (DNS) Challenge<\/a> is intended to stimulate research in the area of noise suppression, which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 19 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Deep Noise Suppression Challenge.\r\n\r\n \r\n

1st place<\/h3>\r\nOrganization: Institute of Acoustics, Chinese Academy of Sciences\r\nAuthors: Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li\r\nPaper: ICASSP 2021 DEEP NOISE SUPPRESSION CHALLENGE: DECOUPLING MAGNITUDE AND PHASE OPTIMIZATION WITH A TWO-STAGE DEEP NETWORK<\/a>\r\n\r\n
\r\n\r\n

2nd place<\/h3>\r\nOrganization: Sogou\r\nAuthors: Jingdong Li, Dawei Luo, Yun Liu, Yuanyuan Zhu, Zhaoxia Li, Guohui Cui, Wenqi Tang, Wei Chen\r\nPaper: Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement<\/a>\r\n\r\n
\r\n\r\n

3rd place<\/h3>\r\nOrganization: Seol National University, Supertone\r\nAuthors: Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee\r\nPaper: REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET<\/a>"}],"msr_startdate":"2021-06-06","msr_enddate":"2021-06-11","msr_event_time":"","msr_location":"Virtual","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"June 6, 2021","msr_register_text":"Watch now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":null,"event_excerpt":"Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) event.","msr_research_lab":[],"related-researchers":[],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-opportunities":[],"related-publications":[754333,763438,810712,810724,815227,752071,754324],"related-videos":[],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/748330"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/748330\/revisions"}],"predecessor-version":[{"id":752965,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/748330\/revisions\/752965"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=748330"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=748330"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=748330"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=748330"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=748330"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=748330"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=748330"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=748330"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=748330"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}