All times are displayed in GMT +8
Sunday, October 25
20:00 – 21:30 | Tutorial B-2-1
Neural Approaches to Conversational Information Retrieval
Jianfeng Gao, Chenyan Xiong, Paul Bennett
20:00 – 21:30 | Tutorial B-3-1
Neural Models for Speaker Diarization in the Context of Speech Recognition
Kyu J. Han, Tae Jin Park, Dimitrios Dimitriadis
21:45 – 23:15 | Tutorial B-2-2
Neural Approaches to Conversational Information Retrieval
Jianfeng Gao, Chenyan Xiong, Paul Bennett
21:45 – 23:15 | Tutorial B-3-2
Neural Models for Speaker Diarization in the Context of Speech Recognition
Kyu J. Han, Tae Jin Park, Dimitrios Dimitriadis
Monday, October 26
19:15 – 20:15 | ASR neural network architectures I
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition (Microsoft Research Asia)
Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu
19:15 – 20:15 | ASR neural network architectures I
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka
19:15 – 20:15 | Multi-channel speech enhancement
Online directional speech enhancement using geometrically constrained independent vector analysis
Li Li, Kazuhito Koishida, Shoji Makino
19:15 – 20:15 | Multi-channel speech enhancement
An End-to-end Architecture of Online Multi-channel Speech Separation
Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan
19:15 – 20:15 | Speech Signal Representation
Robust pitch regression with voiced/unvoiced classification in nonstationary noise environments
Dung Tran, Uros Batricevic, Kazuhito Koishida
19:15 – 20:15 | Speaker Diarization
Online Speaker Diarization with Relation Network
Xiang Li, Yucheng Zhao, Chong Luo, Wenjun Zeng
19:15 – 20:15 | Speaker Diarization
Speaker attribution with voice profiles by graph-based semi-supervised learning
Jixuan Wang (University of Toronto), Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz (University of Toronto) and Michael Brudno (University of Toronto)
19:15 – 20:15 | Noise robust and distant speech recognition
Neural Speech Separation Using Spatially Distributed Microphones
Dongmei Wang, Zhuo Chen and Takuya Yoshioka
20:30 – 21:30 | ASR neural network architectures and training I
Fast and Slow Acoustic Model
Kshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu
20:30 – 21:30 | Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation
Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System
Kai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhi-Jie Yan
20:30 – 21:30 | ASR model training and strategies
Semantic Mask for Transformer based End-to-End Speech Recognition
Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou
20:30 – 21:30 | ASR model training and strategies
A Federated Approach in Training Acoustic Models
Dimitrios Dimitriadis, Kenichi Kumatani, Robert Gmyr, Yashesh Gaur, Sefik Emre Eskimez
21:45 – 22:45 | Cross/multi-lingual and code-switched speech recognition
A 43 Language Multilingual Punctuation Prediction Neural Network Model
Xinxing Li, Edward Lin
21:45 – 22:45 | Singing Voice Computing and Processing in Music
Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music
Yuanbo Hou, Frank Soong, Jian Luan, Shengchen Li
21:45 – 22:45 | Acoustic model adaptation for ASR
Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator
Yan Huang, Jinyu Li, Lei He, Wenning Wei, William Gale, Yifan Gong
21:45 – 22:45 | Singing and Multimodal Synthesis
Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer
Jie Wu, Jian Luan
21:45 – 22:45 | Singing and Multimodal Synthesis
XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System
Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou
21:45 – 22:45 | Student Events
ISCA-SAC: 2nd Mentoring Event
Mentor: Jinyu Li
Tuesday, October 27
19:15 – 20:15 | Feature extraction and distant ASR
Bandpass Noise Generation and Augmentation for Unified ASR
Kshitiz Kumar, Bo Ren, Yifan Gong, Jian Wu
19:15 – 20:15 | Search for speech recognition
Combination of end-to-end and hybrid models for speech recognition
Jeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, Jinyu Li, Yifan Gong
Wednesday, October 28
19:15 – 20:15 | Streaming ASR
1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM
Kshitiz Kumar, Chaojun Liu, Yifan Gong, Jian Wu
19:15 – 20:15 | Streaming ASR
Low Latency End-to-End Streaming Speech Recognition with a Scout Network
Chengyi Wang, Yu Wu, Liang Lu, Shujie Liu, Jinyu Li, Guoli Ye, Ming Zhou
19:15 – 20:15 | Streaming ASR
Transfer Learning Approaches for Streaming End-to-End Speech Recognition System
Vikas Joshi, Rui Zhao, Rupesh Mehta, Kshitiz Kumar, Jinyu Li
19:15 – 20:15 | Applications of ASR
SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems
Huili Chen, Bita Darvish Rouhani, Farinaz Koushanfar
19:15 – 20:15 | Single-channel speech enhancement I
Low-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks
Ahmet E. Bulut, Kazuhito Koishida
19:15 – 20:15 | Single-channel speech enhancement I
Single-channel speech enhancement by subspace affinity minimization
Dung Tran, Kazuhito Koishida
19:15 – 20:15 | Deep Noise Suppression Challenge
The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results
Chandan Karadagur, Ananda Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke
20:30 – 21:30 | Spoken Term Detection
Re-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting
Kun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song
20:30 – 21:30 | Training strategies for ASR
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
20:30 – 21:30 | Speech transmission & coding
An Open source Implementation of ITU-T Recommendation P.808 with Validation
Babak Naderi, Ross Cutler
20:30 – 21:30 | Speech transmission & coding
DNN No-Reference PSTN Speech Quality Prediction
Gabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner
20:30 – 21:30 | Speech Synthesis: Multilingual and Cross-lingual approaches
On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model
Shubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Mehta
21:45 – 22:45 | Speech Synthesis Paradigms and Methods II
Towards Universal Text-to-Speech
Jingzhou Yang, Lei He
21:45 – 22:45 | Speech Synthesis Paradigms and Methods II
Enhancing Monotonicity for Robust Autoregressive Transformer TTS
Xiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao
21:45 – 22:45 | Speech Synthesis: Prosody and Emotion
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
21:45 – 22:45 | Speech Synthesis: Prosody and Emotion
GAN-based Data Generation for Speech Emotion Recognition
Sefik Emre Eskimez, Dimitrios Dimitriadis, Robert Gmyr, Kenichi Kumatani
21:45 – 22:45 | Student Events
ISCA-SAC: 7th Students Meet the Experts
Panelist: Sunayana Sitaram
Thursday, October 29
19:15 – 20:15 | Speech Synthesis: Neural Waveform Generation II
An Efficient Subband Linear Prediction for LPCNet-based Neural Synthesis
Yang Cui, Xi Wang, Lei He, Frank Soong
19:15 – 20:15 | ASR neural network architectures and training II
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability
Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong
19:15 – 20:15 | New Trends in self-supervised speech processing
Sequence-level Self-learning with Multiple Hypotheses
Kenichi Kumatani, Dimitrios Dimitriadis, Robert Gmyr, Yashesh Gaur, Sefik Emre Eskimez, Jinyu Li, Michael Zeng
19:15 – 20:15 | Spoken Dialogue System
Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-oriented Spoken Dialog
Yao Qian, Yu Shi, Michael Zeng
19:15 – 20:15 | Spoken Dialogue System
Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task
Xinnuo Xu, Yizhe Zhang, Lars Liden, Sungjin Lee
19:15 – 20:15 | Speech Synthesis: Toward End-to-End Synthesis
MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search
Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu
19:15 – 20:15 | Speech Synthesis: Toward End-to-End Synthesis
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu
20:30 – 21:30 | Speech Synthesis: Prosody Modeling
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song
21:45 – 22:45 | Multilingual and code-switched ASR
Improving Low Resource Code-switched ASR using Augmented Code-switched TTS
Yash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi
21:45 – 22:45 | ASR neural network architectures II – Transformers
Exploring Transformers for Large-Scale Speech Recognition
Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong