{"id":699085,"date":"2020-10-19T15:23:24","date_gmt":"2020-10-19T22:23:24","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&p=699085"},"modified":"2020-10-21T16:07:15","modified_gmt":"2020-10-21T23:07:15","slug":"interspeech-2020","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/interspeech-2020\/","title":{"rendered":"Microsoft at INTERSPEECH 2020"},"content":{"rendered":"

Website<\/strong>: INTERSPEECH 2020 (opens in new tab)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"

Microsoft is proud to be a gold sponsor of INTERSPEECH 2020.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"msr_startdate":"2020-10-25","msr_enddate":"2020-10-29","msr_location":"Virtual","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"","msr_hide_region":false,"msr_private_event":false,"footnotes":""},"research-area":[13545],"msr-region":[256048],"msr-event-type":[197941],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-699085","msr-event","type-msr-event","status-publish","hentry","msr-research-area-human-language-technologies","msr-region-global","msr-event-type-conferences","msr-locale-en_us"],"msr_about":"Website<\/strong>: INTERSPEECH 2020<\/a>","tab-content":[{"id":0,"name":"About","content":"Microsoft is proud to be a gold sponsor of INTERSPEECH 2020<\/a>. See more details on our contributions on the sessions tab."},{"id":1,"name":"Sessions","content":"All times are displayed in GMT +8<\/em>\r\n

Sunday, October 25<\/h2>\r\n20:00 \u2013 21:30 | Tutorial B-2-1\r\nNeural Approaches to Conversational Information Retrieval<\/strong>\r\nJianfeng Gao<\/a>, Chenyan Xiong<\/a>, Paul Bennett<\/a>\r\n\r\n20:00 \u2013 21:30 | Tutorial B-3-1\r\nNeural Models for Speaker Diarization in the Context of Speech Recognition<\/strong>\r\nKyu J. Han, Tae Jin Park, Dimitrios Dimitriadis<\/a>\r\n\r\n21:45 \u2013 23:15 | Tutorial B-2-2\r\nNeural Approaches to Conversational Information Retrieval<\/strong>\r\nJianfeng Gao<\/a>, Chenyan Xiong<\/a>, Paul Bennett<\/a>\r\n\r\n21:45 \u2013 23:15 | Tutorial B-3-2\r\nNeural Models for Speaker Diarization in the Context of Speech Recognition<\/strong>\r\nKyu J. Han, Tae Jin Park, Dimitrios Dimitriadis<\/a>\r\n

Monday, October 26<\/h2>\r\n19:15 \u2013 20:15 | ASR neural network architectures I\r\nOn the Comparison of Popular End-to-End Models for Large Scale Speech Recognition<\/strong> (Microsoft Research Asia)\r\nJinyu Li<\/a>, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu<\/a>\r\n\r\n19:15 \u2013 20:15 | ASR neural network architectures I\r\nJoint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers<\/strong>\r\nNaoyuki Kanda<\/a>, Yashesh Gaur, Xiaofei Wang<\/a>, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka<\/a>\r\n\r\n19:15 \u2013 20:15 | Multi-channel speech enhancement\r\nOnline directional speech enhancement using geometrically constrained independent vector analysis<\/strong>\r\nLi Li, Kazuhito Koishida, Shoji Makino\r\n\r\n19:15 \u2013 20:15 | Multi-channel speech enhancement\r\nAn End-to-end Architecture of Online Multi-channel Speech Separation<\/strong>\r\nJian Wu, Zhuo Chen, Jinyu Li<\/a>, Takuya Yoshioka<\/a>, Zhili Tan\r\n\r\n19:15 \u2013 20:15 | Speech Signal Representation\r\nRobust pitch regression with voiced\/unvoiced classification in nonstationary noise environments<\/strong>\r\nDung Tran, Uros Batricevic, Kazuhito Koishida\r\n\r\n19:15 \u2013 20:15 | Speaker Diarization\r\nOnline Speaker Diarization with Relation Network<\/strong>\r\nXiang Li, Yucheng Zhao, Chong Luo<\/a>, Wenjun Zeng<\/a>\r\n\r\n19:15 \u2013 20:15 | Speaker Diarization\r\nSpeaker attribution with voice profiles by graph-based semi-supervised learning<\/strong>\r\nJixuan Wang (University of Toronto), Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz (University of Toronto) and Michael Brudno (University of Toronto)\r\n\r\n19:15 \u2013 20:15 | Noise robust and distant speech recognition\r\nNeural Speech Separation Using Spatially Distributed Microphones<\/strong>\r\nDongmei Wang<\/a>, Zhuo Chen and Takuya Yoshioka<\/a>\r\n\r\n20:30 \u2013 21:30 | ASR neural network architectures and training I\r\nFast and Slow Acoustic Model<\/strong>\r\nKshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu\r\n\r\n20:30 \u2013 21:30 | Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation\r\nNeural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System<\/strong>\r\nKai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhi-Jie Yan\r\n\r\n20:30 \u2013 21:30 | ASR model training and strategies\r\nSemantic Mask for Transformer based End-to-End Speech Recognition<\/strong>\r\nChengyi Wang, Yu Wu, Yujiao Du, Jinyu Li<\/a>, Shujie Liu<\/a>, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou<\/a>\r\n\r\n20:30 \u2013 21:30 | ASR model training and strategies\r\nA Federated Approach in Training Acoustic Models<\/strong>\r\nDimitrios Dimitriadis<\/a>, Kenichi Kumatani, Robert Gmyr<\/a>, Yashesh Gaur, Sefik Emre Eskimez<\/a>\r\n\r\n21:45 \u2013 22:45 | Cross\/multi-lingual and code-switched speech recognition\r\nA 43 Language Multilingual Punctuation Prediction Neural Network Model<\/strong>\r\nXinxing Li, Edward Lin\r\n\r\n21:45 \u2013 22:45 | Singing Voice Computing and Processing in Music\r\nTransfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music<\/strong>\r\nYuanbo Hou, Frank Soong<\/a>, Jian Luan, Shengchen Li\r\n\r\n21:45 \u2013 22:45 | Acoustic model adaptation for ASR\r\nRapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator<\/strong>\r\nYan Huang<\/a>, Jinyu Li<\/a>, Lei He, Wenning Wei, William Gale, Yifan Gong<\/a>\r\n\r\n21:45 \u2013 22:45 | Singing and Multimodal Synthesis\r\nAdversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer<\/strong>\r\nJie Wu, Jian Luan\r\n\r\n21:45 \u2013 22:45 | Singing and Multimodal Synthesis\r\nXiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System<\/strong>\r\nPeiling Lu, Jie Wu, Jian Luan, Xu Tan<\/a>, Li Zhou\r\n\r\n21:45 \u2013 22:45 | Student Events\r\nISCA-SAC: 2nd Mentoring Event<\/strong>\r\nMentor: Jinyu Li<\/a>\r\n

Tuesday, October 27<\/h2>\r\n19:15 \u2013 20:15 | Feature extraction and distant ASR\r\nBandpass Noise Generation and Augmentation for Unified ASR<\/strong>\r\nKshitiz Kumar, Bo Ren, Yifan Gong<\/a>, Jian Wu\r\n\r\n19:15 \u2013 20:15 | Search for speech recognition\r\nCombination of end-to-end and hybrid models for speech recognition<\/strong>\r\nJeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, Jinyu Li<\/a>, Yifan Gong<\/a>\r\n

Wednesday, October 28<\/h2>\r\n19:15 \u2013 20:15 | Streaming ASR\r\n1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM<\/strong>\r\nKshitiz Kumar, Chaojun Liu, Yifan Gong<\/a>, Jian Wu\r\n\r\n19:15 \u2013 20:15 | Streaming ASR\r\nLow Latency End-to-End Streaming Speech Recognition with a Scout Network<\/strong>\r\nChengyi Wang, Yu Wu, Liang Lu, Shujie Liu<\/a>, Jinyu Li<\/a>, Guoli Ye, Ming Zhou<\/a>\r\n\r\n19:15 \u2013 20:15 | Streaming ASR\r\nTransfer Learning Approaches for Streaming End-to-End Speech Recognition System<\/strong>\r\nVikas Joshi, Rui Zhao, Rupesh Mehta, Kshitiz Kumar, Jinyu Li<\/a>\r\n\r\n19:15 \u2013 20:15 | Applications of ASR\r\nSpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems<\/strong>\r\nHuili Chen, Bita Darvish Rouhani, Farinaz Koushanfar\r\n\r\n19:15 \u2013 20:15 | Single-channel speech enhancement I\r\nLow-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks<\/strong>\r\nAhmet E. Bulut, Kazuhito Koishida\r\n\r\n19:15 \u2013 20:15 | Single-channel speech enhancement I\r\nSingle-channel speech enhancement by subspace affinity minimization<\/strong>\r\nDung Tran, Kazuhito Koishida\r\n\r\n19:15 \u2013 20:15 | Deep Noise Suppression Challenge\r\nThe INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results<\/strong>\r\nChandan Karadagur, Ananda Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun<\/a>, Puneet Rana, Sriram Srinivasan, Johannes Gehrke<\/a>\r\n\r\n20:30 \u2013 21:30 | Spoken Term Detection\r\nRe-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting<\/strong>\r\nKun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song\r\n\r\n20:30 \u2013 21:30 | Training strategies for ASR\r\nSerialized Output Training for End-to-End Overlapped Speech Recognition<\/strong>\r\nNaoyuki Kanda<\/a>, Yashesh Gaur, Xiaofei Wang<\/a>, Zhong Meng, Takuya Yoshioka<\/a>\r\n\r\n20:30 \u2013 21:30 | Speech transmission & coding\r\nAn Open source Implementation of ITU-T Recommendation P.808 with Validation<\/strong>\r\nBabak Naderi, Ross Cutler\r\n\r\n20:30 \u2013 21:30 | Speech transmission & coding\r\nDNN No-Reference PSTN Speech Quality Prediction<\/strong>\r\nGabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner\r\n\r\n20:30 \u2013 21:30 | Speech Synthesis: Multilingual and Cross-lingual approaches\r\nOn Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model<\/strong>\r\nShubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Mehta\r\n\r\n21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II\r\nTowards Universal Text-to-Speech<\/strong>\r\nJingzhou Yang, Lei He\r\n\r\n21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II\r\nEnhancing Monotonicity for Robust Autoregressive Transformer TTS<\/strong>\r\nXiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao\r\n\r\n21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion\r\nHierarchical Multi-Grained Generative Model for Expressive Speech Synthesis<\/strong>\r\nYukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda\r\n\r\n21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion\r\nGAN-based Data Generation for Speech Emotion Recognition<\/strong>\r\nSefik Emre Eskimez<\/a>, Dimitrios Dimitriadis<\/a>, Robert Gmyr<\/a>, Kenichi Kumatani\r\n\r\n21:45 \u2013 22:45 | Student Events\r\nISCA-SAC: 7th Students Meet the Experts<\/strong>\r\nPanelist: Sunayana Sitaram<\/a>\r\n

Thursday, October 29<\/h2>\r\n19:15 \u2013 20:15 | Speech Synthesis: Neural Waveform Generation II\r\nAn Efficient Subband Linear Prediction for LPCNet-based Neural Synthesis<\/strong>\r\nYang Cui, Xi Wang, Lei He, Frank Soong<\/a>\r\n\r\n19:15 \u2013 20:15 | ASR neural network architectures and training II\r\nDeveloping RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability<\/strong>\r\nJinyu Li<\/a>, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy<\/a>, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong<\/a>\r\n\r\n19:15 \u2013 20:15 | New Trends in self-supervised speech processing\r\nSequence-level Self-learning with Multiple Hypotheses<\/strong>\r\nKenichi Kumatani, Dimitrios Dimitriadis<\/a>, Robert Gmyr<\/a>, Yashesh Gaur, Sefik Emre Eskimez<\/a>, Jinyu Li<\/a>, Michael Zeng<\/a>\r\n\r\n19:15 \u2013 20:15 | Spoken Dialogue System\r\nDiscriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-oriented Spoken Dialog<\/strong>\r\nYao Qian<\/a>, Yu Shi<\/a>, Michael Zeng<\/a>\r\n\r\n19:15 \u2013 20:15 | Spoken Dialogue System\r\nDatasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task<\/strong>\r\nXinnuo Xu, Yizhe Zhang<\/a>, Lars Liden<\/a>, Sungjin Lee\r\n\r\n19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis\r\nMoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search<\/strong>\r\nNaihan Li, Shujie Liu<\/a>, Yanqing Liu, Sheng Zhao, Ming Liu\r\n\r\n19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis\r\nMultiSpeech: Multi-Speaker Text to Speech with Transformer<\/strong>\r\nMingjian Chen, Xu Tan<\/a>, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin<\/a>, Tie-Yan Liu<\/a>\r\n\r\n20:30 \u2013 21:30 | Speech Synthesis: Prosody Modeling\r\nMulti-Reference Neural TTS Stylization with Adversarial Cycle Consistency<\/strong>\r\nMatt Whitehill, Shuang Ma, Daniel McDuff<\/a>, Yale Song<\/a>\r\n\r\n21:45 \u2013 22:45 | Multilingual and code-switched ASR\r\nImproving Low Resource Code-switched ASR using Augmented Code-switched TTS<\/strong>\r\nYash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi\r\n\r\n21:45 \u2013 22:45 | ASR neural network architectures II \u2013 Transformers\r\nExploring Transformers for Large-Scale Speech Recognition<\/strong>\r\nLiang Lu, Changliang Liu, Jinyu Li<\/a>, Yifan Gong<\/a>"}],"msr_startdate":"2020-10-25","msr_enddate":"2020-10-29","msr_event_time":"","msr_location":"Virtual","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"October 25, 2020","msr_register_text":"Watch now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":null,"event_excerpt":"Microsoft is proud to be a gold sponsor of INTERSPEECH 2020.","msr_research_lab":[],"related-researchers":[],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-opportunities":[],"related-publications":[810823,810844],"related-videos":[],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/699085"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/699085\/revisions"}],"predecessor-version":[{"id":699112,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/699085\/revisions\/699112"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=699085"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=699085"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=699085"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=699085"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=699085"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=699085"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=699085"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=699085"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=699085"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}