{"id":699085,"date":"2020-10-19T15:23:24","date_gmt":"2020-10-19T22:23:24","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&p=699085"},"modified":"2025-08-06T11:52:21","modified_gmt":"2025-08-06T18:52:21","slug":"interspeech-2020","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/interspeech-2020\/","title":{"rendered":"Microsoft at INTERSPEECH 2020"},"content":{"rendered":"\n\n
Website<\/strong>: INTERSPEECH 2020 (opens in new tab)<\/span><\/a>Opens in a new tab<\/span><\/p>\n Microsoft is proud to be a gold sponsor of INTERSPEECH 2020 (opens in new tab)<\/span><\/a>. See more details on our contributions on the sessions tab.Opens in a new tab<\/span><\/p>\n All times are displayed in GMT +8<\/em><\/p>\n 20:00 \u2013 21:30 | Tutorial B-2-1 20:00 \u2013 21:30 | Tutorial B-3-1 21:45 \u2013 23:15 | Tutorial B-2-2 21:45 \u2013 23:15 | Tutorial B-3-2 19:15 \u2013 20:15 | ASR neural network architectures I 19:15 \u2013 20:15 | ASR neural network architectures I 19:15 \u2013 20:15 | Multi-channel speech enhancement 19:15 \u2013 20:15 | Multi-channel speech enhancement 19:15 \u2013 20:15 | Speech Signal Representation 19:15 \u2013 20:15 | Speaker Diarization 19:15 \u2013 20:15 | Speaker Diarization 19:15 \u2013 20:15 | Noise robust and distant speech recognition 20:30 \u2013 21:30 | ASR neural network architectures and training I 20:30 \u2013 21:30 | Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation 20:30 \u2013 21:30 | ASR model training and strategies 20:30 \u2013 21:30 | ASR model training and strategies 21:45 \u2013 22:45 | Cross\/multi-lingual and code-switched speech recognition 21:45 \u2013 22:45 | Singing Voice Computing and Processing in Music 21:45 \u2013 22:45 | Acoustic model adaptation for ASR 21:45 \u2013 22:45 | Singing and Multimodal Synthesis 21:45 \u2013 22:45 | Singing and Multimodal Synthesis 21:45 \u2013 22:45 | Student Events 19:15 \u2013 20:15 | Feature extraction and distant ASR 19:15 \u2013 20:15 | Search for speech recognition 19:15 \u2013 20:15 | Streaming ASR 19:15 \u2013 20:15 | Streaming ASR 19:15 \u2013 20:15 | Streaming ASR 19:15 \u2013 20:15 | Applications of ASR 19:15 \u2013 20:15 | Single-channel speech enhancement I 19:15 \u2013 20:15 | Single-channel speech enhancement I 19:15 \u2013 20:15 | Deep Noise Suppression Challenge 20:30 \u2013 21:30 | Spoken Term Detection 20:30 \u2013 21:30 | Training strategies for ASR 20:30 \u2013 21:30 | Speech transmission & coding 20:30 \u2013 21:30 | Speech transmission & coding 20:30 \u2013 21:30 | Speech Synthesis: Multilingual and Cross-lingual approaches 21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II 21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II 21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion 21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion 21:45 \u2013 22:45 | Student Events 19:15 \u2013 20:15 | Speech Synthesis: Neural Waveform Generation II 19:15 \u2013 20:15 | ASR neural network architectures and training II 19:15 \u2013 20:15 | New Trends in self-supervised speech processing 19:15 \u2013 20:15 | Spoken Dialogue System 19:15 \u2013 20:15 | Spoken Dialogue System 19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis 19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis 20:30 \u2013 21:30 | Speech Synthesis: Prosody Modeling 21:45 \u2013 22:45 | Multilingual and code-switched ASR 21:45 \u2013 22:45 | ASR neural network architectures II \u2013 Transformers Microsoft is proud to be a gold sponsor of INTERSPEECH 2020.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_startdate":"2020-10-25","msr_enddate":"2020-10-29","msr_location":"Virtual","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"","msr_hide_region":false,"msr_private_event":false,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[13545],"msr-region":[256048],"msr-event-type":[197941],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-699085","msr-event","type-msr-event","status-publish","hentry","msr-research-area-human-language-technologies","msr-region-global","msr-event-type-conferences","msr-locale-en_us"],"msr_about":"\n\n Website<\/strong>: INTERSPEECH 2020 (opens in new tab)<\/span><\/a>Opens in a new tab<\/span><\/p>\n Microsoft is proud to be a gold sponsor of INTERSPEECH 2020 (opens in new tab)<\/span><\/a>. See more details on our contributions on the sessions tab.Opens in a new tab<\/span><\/p>\n All times are displayed in GMT +8<\/em><\/p>\n 20:00 \u2013 21:30 | Tutorial B-2-1 20:00 \u2013 21:30 | Tutorial B-3-1 21:45 \u2013 23:15 | Tutorial B-2-2 21:45 \u2013 23:15 | Tutorial B-3-2 19:15 \u2013 20:15 | ASR neural network architectures I 19:15 \u2013 20:15 | ASR neural network architectures I 19:15 \u2013 20:15 | Multi-channel speech enhancement 19:15 \u2013 20:15 | Multi-channel speech enhancement 19:15 \u2013 20:15 | Speech Signal Representation 19:15 \u2013 20:15 | Speaker Diarization 19:15 \u2013 20:15 | Speaker Diarization 19:15 \u2013 20:15 | Noise robust and distant speech recognition 20:30 \u2013 21:30 | ASR neural network architectures and training I 20:30 \u2013 21:30 | Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation 20:30 \u2013 21:30 | ASR model training and strategies 20:30 \u2013 21:30 | ASR model training and strategies 21:45 \u2013 22:45 | Cross\/multi-lingual and code-switched speech recognition 21:45 \u2013 22:45 | Singing Voice Computing and Processing in Music 21:45 \u2013 22:45 | Acoustic model adaptation for ASR 21:45 \u2013 22:45 | Singing and Multimodal Synthesis 21:45 \u2013 22:45 | Singing and Multimodal Synthesis 21:45 \u2013 22:45 | Student Events 19:15 \u2013 20:15 | Feature extraction and distant ASR 19:15 \u2013 20:15 | Search for speech recognition 19:15 \u2013 20:15 | Streaming ASR 19:15 \u2013 20:15 | Streaming ASR 19:15 \u2013 20:15 | Streaming ASR 19:15 \u2013 20:15 | Applications of ASR 19:15 \u2013 20:15 | Single-channel speech enhancement I 19:15 \u2013 20:15 | Single-channel speech enhancement I 19:15 \u2013 20:15 | Deep Noise Suppression Challenge 20:30 \u2013 21:30 | Spoken Term Detection 20:30 \u2013 21:30 | Training strategies for ASR 20:30 \u2013 21:30 | Speech transmission & coding 20:30 \u2013 21:30 | Speech transmission & coding 20:30 \u2013 21:30 | Speech Synthesis: Multilingual and Cross-lingual approaches 21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II 21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods IISunday, October 25<\/h2>\n
\nNeural Approaches to Conversational Information Retrieval<\/strong>
\nJianfeng Gao<\/a>, Chenyan Xiong<\/a>, Paul Bennett<\/a><\/p>\n
\nNeural Models for Speaker Diarization in the Context of Speech Recognition<\/strong>
\nKyu J. Han, Tae Jin Park, Dimitrios Dimitriadis<\/a><\/p>\n
\nNeural Approaches to Conversational Information Retrieval<\/strong>
\nJianfeng Gao<\/a>, Chenyan Xiong<\/a>, Paul Bennett<\/a><\/p>\n
\nNeural Models for Speaker Diarization in the Context of Speech Recognition<\/strong>
\nKyu J. Han, Tae Jin Park, Dimitrios Dimitriadis<\/a><\/p>\nMonday, October 26<\/h2>\n
\nOn the Comparison of Popular End-to-End Models for Large Scale Speech Recognition<\/strong> (Microsoft Research Asia)
\nJinyu Li<\/a>, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu<\/a><\/p>\n
\nJoint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers<\/strong>
\nNaoyuki Kanda<\/a>, Yashesh Gaur, Xiaofei Wang<\/a>, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka<\/a><\/p>\n
\nOnline directional speech enhancement using geometrically constrained independent vector analysis<\/strong>
\nLi Li, Kazuhito Koishida, Shoji Makino<\/p>\n
\nAn End-to-end Architecture of Online Multi-channel Speech Separation<\/strong>
\nJian Wu, Zhuo Chen, Jinyu Li<\/a>, Takuya Yoshioka<\/a>, Zhili Tan<\/p>\n
\nRobust pitch regression with voiced\/unvoiced classification in nonstationary noise environments<\/strong>
\nDung Tran, Uros Batricevic, Kazuhito Koishida<\/p>\n
\nOnline Speaker Diarization with Relation Network<\/strong>
\nXiang Li, Yucheng Zhao, Chong Luo<\/a>, Wenjun Zeng<\/a><\/p>\n
\nSpeaker attribution with voice profiles by graph-based semi-supervised learning<\/strong>
\nJixuan Wang (University of Toronto), Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz (University of Toronto) and Michael Brudno (University of Toronto)<\/p>\n
\nNeural Speech Separation Using Spatially Distributed Microphones<\/strong>
\nDongmei Wang<\/a>, Zhuo Chen and Takuya Yoshioka<\/a><\/p>\n
\nFast and Slow Acoustic Model<\/strong>
\nKshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu<\/p>\n
\nNeural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System<\/strong>
\nKai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhi-Jie Yan<\/p>\n
\nSemantic Mask for Transformer based End-to-End Speech Recognition<\/strong>
\nChengyi Wang, Yu Wu, Yujiao Du, Jinyu Li<\/a>, Shujie Liu<\/a>, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou<\/a><\/p>\n
\nA Federated Approach in Training Acoustic Models<\/strong>
\nDimitrios Dimitriadis<\/a>, Kenichi Kumatani, Robert Gmyr<\/a>, Yashesh Gaur, Sefik Emre Eskimez<\/a><\/p>\n
\nA 43 Language Multilingual Punctuation Prediction Neural Network Model<\/strong>
\nXinxing Li, Edward Lin<\/p>\n
\nTransfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music<\/strong>
\nYuanbo Hou, Frank Soong<\/a>, Jian Luan, Shengchen Li<\/p>\n
\nRapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator<\/strong>
\nYan Huang<\/a>, Jinyu Li<\/a>, Lei He, Wenning Wei, William Gale, Yifan Gong<\/a><\/p>\n
\nAdversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer<\/strong>
\nJie Wu, Jian Luan<\/p>\n
\nXiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System<\/strong>
\nPeiling Lu, Jie Wu, Jian Luan, Xu Tan<\/a>, Li Zhou<\/p>\n
\nISCA-SAC: 2nd Mentoring Event<\/strong>
\nMentor: Jinyu Li<\/a><\/p>\nTuesday, October 27<\/h2>\n
\nBandpass Noise Generation and Augmentation for Unified ASR<\/strong>
\nKshitiz Kumar, Bo Ren, Yifan Gong<\/a>, Jian Wu<\/p>\n
\nCombination of end-to-end and hybrid models for speech recognition<\/strong>
\nJeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, Jinyu Li<\/a>, Yifan Gong<\/a><\/p>\nWednesday, October 28<\/h2>\n
\n1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM<\/strong>
\nKshitiz Kumar, Chaojun Liu, Yifan Gong<\/a>, Jian Wu<\/p>\n
\nLow Latency End-to-End Streaming Speech Recognition with a Scout Network<\/strong>
\nChengyi Wang, Yu Wu, Liang Lu, Shujie Liu<\/a>, Jinyu Li<\/a>, Guoli Ye, Ming Zhou<\/a><\/p>\n
\nTransfer Learning Approaches for Streaming End-to-End Speech Recognition System<\/strong>
\nVikas Joshi, Rui Zhao, Rupesh Mehta, Kshitiz Kumar, Jinyu Li<\/a><\/p>\n
\nSpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems<\/strong>
\nHuili Chen, Bita Darvish Rouhani, Farinaz Koushanfar<\/p>\n
\nLow-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks<\/strong>
\nAhmet E. Bulut, Kazuhito Koishida<\/p>\n
\nSingle-channel speech enhancement by subspace affinity minimization<\/strong>
\nDung Tran, Kazuhito Koishida<\/p>\n
\nThe INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results<\/strong>
\nChandan Karadagur, Ananda Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun<\/a>, Puneet Rana, Sriram Srinivasan, Johannes Gehrke<\/a><\/p>\n
\nRe-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting<\/strong>
\nKun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song<\/p>\n
\nSerialized Output Training for End-to-End Overlapped Speech Recognition<\/strong>
\nNaoyuki Kanda<\/a>, Yashesh Gaur, Xiaofei Wang<\/a>, Zhong Meng, Takuya Yoshioka<\/a><\/p>\n
\nAn Open source Implementation of ITU-T Recommendation P.808 with Validation<\/strong>
\nBabak Naderi, Ross Cutler<\/p>\n
\nDNN No-Reference PSTN Speech Quality Prediction<\/strong>
\nGabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner<\/p>\n
\nOn Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model<\/strong>
\nShubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Mehta<\/p>\n
\nTowards Universal Text-to-Speech<\/strong>
\nJingzhou Yang, Lei He<\/p>\n
\nEnhancing Monotonicity for Robust Autoregressive Transformer TTS<\/strong>
\nXiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao<\/p>\n
\nHierarchical Multi-Grained Generative Model for Expressive Speech Synthesis<\/strong>
\nYukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda<\/p>\n
\nGAN-based Data Generation for Speech Emotion Recognition<\/strong>
\nSefik Emre Eskimez<\/a>, Dimitrios Dimitriadis<\/a>, Robert Gmyr<\/a>, Kenichi Kumatani<\/p>\n
\nISCA-SAC: 7th Students Meet the Experts<\/strong>
\nPanelist: Sunayana Sitaram<\/a><\/p>\nThursday, October 29<\/h2>\n
\nAn Efficient Subband Linear Prediction for LPCNet-based Neural Synthesis<\/strong>
\nYang Cui, Xi Wang, Lei He, Frank Soong<\/a><\/p>\n
\nDeveloping RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability<\/strong>
\nJinyu Li<\/a>, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy<\/a>, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong<\/a><\/p>\n
\nSequence-level Self-learning with Multiple Hypotheses<\/strong>
\nKenichi Kumatani, Dimitrios Dimitriadis<\/a>, Robert Gmyr<\/a>, Yashesh Gaur, Sefik Emre Eskimez<\/a>, Jinyu Li<\/a>, Michael Zeng<\/a><\/p>\n
\nDiscriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-oriented Spoken Dialog<\/strong>
\nYao Qian<\/a>, Yu Shi<\/a>, Michael Zeng<\/a><\/p>\n
\nDatasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task<\/strong>
\nXinnuo Xu, Yizhe Zhang<\/a>, Lars Liden<\/a>, Sungjin Lee<\/p>\n
\nMoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search<\/strong>
\nNaihan Li, Shujie Liu<\/a>, Yanqing Liu, Sheng Zhao, Ming Liu<\/p>\n
\nMultiSpeech: Multi-Speaker Text to Speech with Transformer<\/strong>
\nMingjian Chen, Xu Tan<\/a>, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin<\/a>, Tie-Yan Liu<\/a><\/p>\n
\nMulti-Reference Neural TTS Stylization with Adversarial Cycle Consistency<\/strong>
\nMatt Whitehill, Shuang Ma, Daniel McDuff<\/a>, Yale Song<\/a><\/p>\n
\nImproving Low Resource Code-switched ASR using Augmented Code-switched TTS<\/strong>
\nYash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi<\/p>\n
\nExploring Transformers for Large-Scale Speech Recognition<\/strong>
\nLiang Lu, Changliang Liu, Jinyu Li<\/a>, Yifan Gong<\/a>Opens in a new tab<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"Sunday, October 25<\/h2>\n
\nNeural Approaches to Conversational Information Retrieval<\/strong>
\nJianfeng Gao<\/a>, Chenyan Xiong<\/a>, Paul Bennett<\/a><\/p>\n
\nNeural Models for Speaker Diarization in the Context of Speech Recognition<\/strong>
\nKyu J. Han, Tae Jin Park, Dimitrios Dimitriadis<\/a><\/p>\n
\nNeural Approaches to Conversational Information Retrieval<\/strong>
\nJianfeng Gao<\/a>, Chenyan Xiong<\/a>, Paul Bennett<\/a><\/p>\n
\nNeural Models for Speaker Diarization in the Context of Speech Recognition<\/strong>
\nKyu J. Han, Tae Jin Park, Dimitrios Dimitriadis<\/a><\/p>\nMonday, October 26<\/h2>\n
\nOn the Comparison of Popular End-to-End Models for Large Scale Speech Recognition<\/strong> (Microsoft Research Asia)
\nJinyu Li<\/a>, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu<\/a><\/p>\n
\nJoint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers<\/strong>
\nNaoyuki Kanda<\/a>, Yashesh Gaur, Xiaofei Wang<\/a>, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka<\/a><\/p>\n
\nOnline directional speech enhancement using geometrically constrained independent vector analysis<\/strong>
\nLi Li, Kazuhito Koishida, Shoji Makino<\/p>\n
\nAn End-to-end Architecture of Online Multi-channel Speech Separation<\/strong>
\nJian Wu, Zhuo Chen, Jinyu Li<\/a>, Takuya Yoshioka<\/a>, Zhili Tan<\/p>\n
\nRobust pitch regression with voiced\/unvoiced classification in nonstationary noise environments<\/strong>
\nDung Tran, Uros Batricevic, Kazuhito Koishida<\/p>\n
\nOnline Speaker Diarization with Relation Network<\/strong>
\nXiang Li, Yucheng Zhao, Chong Luo<\/a>, Wenjun Zeng<\/a><\/p>\n
\nSpeaker attribution with voice profiles by graph-based semi-supervised learning<\/strong>
\nJixuan Wang (University of Toronto), Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz (University of Toronto) and Michael Brudno (University of Toronto)<\/p>\n
\nNeural Speech Separation Using Spatially Distributed Microphones<\/strong>
\nDongmei Wang<\/a>, Zhuo Chen and Takuya Yoshioka<\/a><\/p>\n
\nFast and Slow Acoustic Model<\/strong>
\nKshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu<\/p>\n
\nNeural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System<\/strong>
\nKai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhi-Jie Yan<\/p>\n
\nSemantic Mask for Transformer based End-to-End Speech Recognition<\/strong>
\nChengyi Wang, Yu Wu, Yujiao Du, Jinyu Li<\/a>, Shujie Liu<\/a>, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou<\/a><\/p>\n
\nA Federated Approach in Training Acoustic Models<\/strong>
\nDimitrios Dimitriadis<\/a>, Kenichi Kumatani, Robert Gmyr<\/a>, Yashesh Gaur, Sefik Emre Eskimez<\/a><\/p>\n
\nA 43 Language Multilingual Punctuation Prediction Neural Network Model<\/strong>
\nXinxing Li, Edward Lin<\/p>\n
\nTransfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music<\/strong>
\nYuanbo Hou, Frank Soong<\/a>, Jian Luan, Shengchen Li<\/p>\n
\nRapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator<\/strong>
\nYan Huang<\/a>, Jinyu Li<\/a>, Lei He, Wenning Wei, William Gale, Yifan Gong<\/a><\/p>\n
\nAdversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer<\/strong>
\nJie Wu, Jian Luan<\/p>\n
\nXiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System<\/strong>
\nPeiling Lu, Jie Wu, Jian Luan, Xu Tan<\/a>, Li Zhou<\/p>\n
\nISCA-SAC: 2nd Mentoring Event<\/strong>
\nMentor: Jinyu Li<\/a><\/p>\nTuesday, October 27<\/h2>\n
\nBandpass Noise Generation and Augmentation for Unified ASR<\/strong>
\nKshitiz Kumar, Bo Ren, Yifan Gong<\/a>, Jian Wu<\/p>\n
\nCombination of end-to-end and hybrid models for speech recognition<\/strong>
\nJeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, Jinyu Li<\/a>, Yifan Gong<\/a><\/p>\nWednesday, October 28<\/h2>\n
\n1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM<\/strong>
\nKshitiz Kumar, Chaojun Liu, Yifan Gong<\/a>, Jian Wu<\/p>\n
\nLow Latency End-to-End Streaming Speech Recognition with a Scout Network<\/strong>
\nChengyi Wang, Yu Wu, Liang Lu, Shujie Liu<\/a>, Jinyu Li<\/a>, Guoli Ye, Ming Zhou<\/a><\/p>\n
\nTransfer Learning Approaches for Streaming End-to-End Speech Recognition System<\/strong>
\nVikas Joshi, Rui Zhao, Rupesh Mehta, Kshitiz Kumar, Jinyu Li<\/a><\/p>\n
\nSpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems<\/strong>
\nHuili Chen, Bita Darvish Rouhani, Farinaz Koushanfar<\/p>\n
\nLow-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks<\/strong>
\nAhmet E. Bulut, Kazuhito Koishida<\/p>\n
\nSingle-channel speech enhancement by subspace affinity minimization<\/strong>
\nDung Tran, Kazuhito Koishida<\/p>\n
\nThe INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results<\/strong>
\nChandan Karadagur, Ananda Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun<\/a>, Puneet Rana, Sriram Srinivasan, Johannes Gehrke<\/a><\/p>\n
\nRe-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting<\/strong>
\nKun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song<\/p>\n
\nSerialized Output Training for End-to-End Overlapped Speech Recognition<\/strong>
\nNaoyuki Kanda<\/a>, Yashesh Gaur, Xiaofei Wang<\/a>, Zhong Meng, Takuya Yoshioka<\/a><\/p>\n
\nAn Open source Implementation of ITU-T Recommendation P.808 with Validation<\/strong>
\nBabak Naderi, Ross Cutler<\/p>\n
\nDNN No-Reference PSTN Speech Quality Prediction<\/strong>
\nGabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner<\/p>\n
\nOn Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model<\/strong>
\nShubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Mehta<\/p>\n
\nTowards Universal Text-to-Speech<\/strong>
\nJingzhou Yang, Lei He<\/p>\n