All times are displayed in Eastern Daylight Time (UTC -4)
Monday, June 7
10:00 – 13:30 | Tutorial
Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization
Presenters: Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda, Shinji Watanabe
18:00 – 19:00
Young Professionals Panel Discussion
Moderator: Subhro Das
Panelists: Sabrina Rashid, Vanessa Testoni, Hamid Palangi
Tuesday, June 8
13:00 – 13:45 | Speech Synthesis 1: Architecture
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu
13:00 – 13:45 | Speech Synthesis 1: Architecture
A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time
Feng-Long Xie, Xin-Hui Li, Wen-Chao Su, Li Lu, Frank K. Soong
13:00 – 13:45 | Language Modeling 1: Fusion and Training for End-to-End ASR
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong
13:00 – 13:45 | Audio and Speech Source Separation 1: Speech Separation
Session Chair: Zhuo Chen
Rethinking The Separation Layers In Speech Separation Networks
Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou, Nima Mesgarani
13:00 – 13:45 | Deep Learning Training Methods 3
Session Chair: Jinyu Li
13:00 – 13:45 | Brain-Computer Interfaces
Decoding Music Attention from “EEG Headphones”: A User-Friendly Auditory Brain-Computer Interface
Wenkang An, Barbara Shinn-Cunningham, Hannes Gamper, Dimitra Emmanouilidou, David Johnston, Mihai Jalobeanu, Edward Cutrell, Andrew Wilson, Kuan-Jung Chiang, Ivan Tashev
14:00 – 14:45 | Speech Enhancement 1: Speech Separation
Session Chair: Takuya Yoshioka
Dual-Path Modeling for Long Recording Speech Separation in Meetings
Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian
14:00 – 14:45 | Speech Enhancement 1: Speech Separation
Continuous Speech Separation with Conformer
Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou
14:00 – 14:45 | Speech Enhancement 2: Speech Separation and Dereverberation
Session Chair: Takuya Yoshioka
14:00 – 14:45 | Speaker Recognition 1: Benchmark Evaluation
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020
Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong
14:00 – 14:45 | Dialogue Systems 2: Response Generation
Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention
Shijie Zhou, Wenge Rong, Jianfei Zhang, Yanmeng Wang, Libin Shi, Zhang Xiong
16:30 – 17:15 | Speech Recognition 4: Transformer Models 2
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset
Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, Jinyu Li
16:30 – 17:15 | Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation
Session Chair: Hannes Gamper
ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results
Kusha Sridhar, Ross Cutler, Ando Saabas, Tanel Parnamaa, Markus Loide, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan
16:30 – 17:15 | Learning
Session Chair: Zhong Meng
Sequence-Level Self-Teaching Regularization
Eric Sun, Liang Lu, Zhong Meng, Yifan Gong
Wednesday, June 9
13:00 – 13:45 | Language Understanding 1: End-to-end Speech Understanding 1
Speech-Language Pre-Training for End-to-End Spoken Language Understanding
Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng
13:00 – 13:45 | Audio and Speech Source Separation 4: Multi-Channel Source Separation
DBnet: Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation
Ali Aroudi, Sebastian Braun
14:00 – 14:45 | Speech Enhancement 4: Multi-channel Processing
Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu
14:00 – 14:45 | Matrix Factorization and Applications
Cold Start Revisited: A Deep Hybrid Recommender with Cold-Warm Item Harmonization
Oren Barkan, Roy Hirsch, Ori Katz, Avi Caciularu, Yoni Weill, Noam Koenigstein
14:00 – 14:45 | Biological Image Analysis
CMIM: Cross-Modal Information Maximization For Medical Imaging
Tristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di Jorio, Margaux Luck, Devon Hjelm, Yoshua Bengio
15:30 – 16:15 | Speech Recognition 8: Multilingual Speech Recognition
Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts
Amit Das, Kshitiz Kumar, Jian Wu
15:30 – 16:15 | Quality and Intelligibility Measures
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network
Yichong Leng, Xu Tan, Sheng Zhao, Frank K. Soong, Xiang-Yang Li, Tao Qin
15:30 – 16:15 | Quality and Intelligibility Measures
Crowdsourcing Approach for Subjective Evaluation of Echo Impairment
Ross Cutler, Babak Nadari, Markus Loide, Sten Sootla, Ando Saabas
16:30 – 17:15 | Speech Recognition 9: Confidence Measures
Session Chair: Yifan Gong
16:30 – 17:15 | Speech Recognition 10: Robustness to Human Speech Variability
Session Chair: Yifan Gong
16:30 – 17:15 | Speech Processing 2: General Topics
Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
Chandan K A Reddy, Vishak Gopal, Ross Cutler
16:30 – 17:15 | Style and Text Normalization
Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng
16:30 – 17:15 | Modeling, Analysis and Synthesis of Acoustic Environments 3: Acoustic Analysis
Prediction of Object Geometry from Acoustic Scattering Using Convolutional Neural Networks
Ziqi Fan, Vibhav Vineet, Chenshen Lu, T.W. Wu, Kyla McMullen
Thursday, June 10
13:00 – 13:45 | Speech Recognition 11: Novel Approaches
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR
Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka
13:00 – 13:45 | Speech Synthesis 5: Prosody & Style
Speech Bert Embedding for Improving Prosody in Neural TTS
Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He
13:00 – 13:45 | Speech Synthesis 6: Data Augmentation & Adaptation
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data
Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu
14:00 – 14:45 | Speech Enhancement 5: DNS Challenge Task
Session Chair: Chandan K A Reddy
ICASSP 2021 Deep Noise Suppression Challenge
Chandan K A Reddy, Harishchandra Dubey, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan
14:00 – 14:45 | Speech Enhancement 6: Multi-modal Processing
Session Chair: Chandan K A Reddy
14:00 – 14:45 | Graph Signal Processing
Fast Hierarchy Preserving Graph Embedding via Subspace Constraints
Xu Chen, Lun Du, Mengyuan Chen, Yun Wang, QingQing Long, Kunqing Xie
15:30 – 16:15 | Speech Recognition 13: Acoustic Modeling 1
Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings
Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
15:30 – 16:15 | Speech Recognition 14: Acoustic Modeling 2
Ensemble Combination between Different Time Segmentations
Jeremy Heng Meng Wong, Dimitrios Dimitriadis, Kenichi Kumatani, Yashesh Gaur, George Polovets, Partha Parthasarathy, Eric Sun, Jinyu Li, Yifan Gong
15:30 – 16:15 | Privacy and Information Security
Detection Of Malicious DNS and Web Servers using Graph-Based Approaches
Jinyuan Jia, Zheng Dong, Jie Li, Jack W. Stokes
16:30 – 17:15 | Language Assessment
Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples
Bin Su, Shaoguang Mao, Frank K. Soong, Yan Xia, Jonathan Tien, Zhiyong Wu
16:30 – 17:15 | Signal Enhancement and Restoration 1: Deep Learning
Towards Efficient Models for Real-Time Deep Noise Suppression
Sebastian Braun, Hannes Gamper, Chandan K A Reddy, Ivan Tashev
16:30 – 17:15 | Signal Enhancement and Restoration 3: Signal Enhancement
Phoneme-Based Distribution Regularization for Speech Enhancement
Yajing Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu
16:30 – 17:15 | Audio & Images
Session Chair: Ivan Tashev
Friday, June 11
1:30 – 12:15 | Speech Recognition 18: Low Resource ASR
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition
Linghui Meng, Jin Xu, Xu Tan, Jindong Wang, Tao Qin, Bo Xu
11:30 – 12:15 | Speech Synthesis 7: General Topics
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling
Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu
13:00 – 13:45 | Speech Enhancement 8: Echo Cancellation and Other Tasks
Arun Asokan Nair, Kazuhito Koishida
13:00 – 13:45 | Speaker Diarization
Hidden Markov Model Diarisation with Speaker Location Information
Jeremy Heng Meng Wong, Xiong Xiao, Yifan Gong
13:00 – 13:45 | Detection and Classification of Acoustic Scenes and Events 5: Scenes
Cross-Modal Spectrum Transformation Network for Acoustic Scene Classification
Yang Liu, Alexandros Neophytou, Sunando Sengupta, Eric Sommerlade