Microsoft at ICASSP 2020 in Barcelona, Spain

May 4, 2020 - May 8, 2020

Microsoft @ ICASSP 2020

Location: Virtual

Tuesday, May 5

11:30 – 13:30 CEST

MLSP-P2: Applications in Speech and Audio
Multi-Label Sound Event Retrieval Using A Deep Learning-Based Siamese Structure With A Pairwise Presence Matrix (opens in new tab)
Jianyu Fan, Eric Nichols, Daniel Tompkins, Ana Elisa Méndez Méndez, Benjamin Elizalde, Philippe Pasquier

Wednesday, May 6

9:00 – 11:00 CEST

AUD-P4: Feedback, Noise, and Reverberation
Joint Beamforming and Reverberation Cancellation Using a Constrained Kalman Filter with Multichannel Linear Prediction (opens in new tab)
Sahar Hashemgeloogerdi, Sebastian Braun (opens in new tab)

AUD-P4: Feedback, Noise, and Reverberation
Predicting Word Error Rate for Reverberant Speech (opens in new tab)
Hannes Gamper (opens in new tab), Dimitra Emmanouilidou (opens in new tab), Sebastian Braun (opens in new tab), Ivan Tashev (opens in new tab)

SPE-P5: Deep Speaker Recognition Models
Improving Deep CNN Networks with Long Temporal Context for Text-independent Speaker Verification (opens in new tab)
Yong Zhao, Tianyan Zhou, Zhuo Chen, Jian Wu

9:20 – 9:40 CEST

SPE-L6: Speech Enhancement II: Single Channel
Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks (opens in new tab)
Ahmet E. Bulut, Kazuhito Koishida (opens in new tab)

11:30 – 13:30 CEST

SAM-P3: Sparsity, Super-Resolution and Imaging
Low-Rank Toeplits Matrix Estimation Via Random Ultra-Sparse Rulers (opens in new tab)
Hannah Lawrence, Jerry Li (opens in new tab), Cameron Musco, Christopher Musco

SPE-P8: Robust Speech Recognition
A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition (opens in new tab)
Ruizhi Li, Gregory Sell, Xiaofei Wang (opens in new tab), Shinji Watanabe, Hynek Hermansky

16:30 – 16:50 CEST

IFS-L2: Privacy, Biometrics and Information Security
Privacy-Preserving Phishing Web Page Classification Via Fully Homomorphic Encryption (opens in new tab)
Edward Chou, Arun Gururajan, Kim Laine (opens in new tab), Nitin Kumar Goel, Anna Bertiger, Jack W. Stokes (opens in new tab)

16:30 – 18:30 CEST

HLT-P1: Spoken Language Understanding and Dialogue I
Fast Domain Adaptation for Goal-Oriented Dialogue Using A Hybrid Generative-Retrieval Transformer (opens in new tab)
Igor Shalyminov, Alessandro Sordoni (opens in new tab), Adam Atkinson (opens in new tab), Hannes Schulz (opens in new tab)

SPE-P9: End-to-end Speech Recognition III: General Topics
Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition (opens in new tab)
Hu Hu, Rui Zhao, Jinyu Li (opens in new tab), Liang Lu, Yifan Gong (opens in new tab)

Thursday, May 7

9:00 – 11:00 CEST

HLT-P2: Speech and Language Analysis
Combining Acoustics, Content and interaction Features to Find Hot Spots in Meetings (opens in new tab)
Dave Makhervaks, William Hinthorn (opens in new tab), Dimitrios Dimitriadis (opens in new tab), Andreas Stolcke

10:20 – 10:40 CEST

AUD-L6: Acoustic Environments and Spatial Audio II
Fast Acoustic Scattering Using Convolutional Neural Networks (opens in new tab)
Ziqi Fan, Vibhav Vineet (opens in new tab), Hannes Gamper (opens in new tab), Nikunj Raghuvanshi (opens in new tab)

10:40 – 11:00 CEST

SPE-L11: Speech Separation and Extraction I: Single Channel
An Online Speaker-Aware Speech Separation Approach Based on Time-Domain Representation (opens in new tab)
Hui Wang, Yan Song, Zeng-Xi Li, Ian McLoughlin, Li-Rong Dai

11:30 – 13:30 CEST

SPE-P12: Machine Learning for Speech Synthesis II
Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network (opens in new tab)
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong (opens in new tab), Hong-Goo Kang

SPE-P13: Speech Separation and Extraction III
Continuous Speech Separation: Dataset and Analysis (opens in new tab)
Zhuo Chen, Takuya Yoshioka (opens in new tab), Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li (opens in new tab)

12:10 – 12:30 CEST

SPE-L12: Speech Separation and Extraction II: Multi-channel
End-to-End Microphone Permutation and Number Invariant Multi-Channel Speech Separation (opens in new tab)
Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka (opens in new tab)

16:30 – 18:30 CEST

MMSP-P3: Multimedia Signal Processing
Supervised Deep Hashing for Efficient Audio Event Retrieval (opens in new tab)
Arindam Jati, Dimitra Emmanouilidou (opens in new tab)

MMSP-P3: Multimedia Signal Processing
Multimodal Active Speaker Detection and Virtual Cinematography for Video Conferencing (opens in new tab)
Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle

SPE-P15: Speech Recognition: Adaptation
L-Vector: Neural Label Embedding for Domain Adaptation (opens in new tab)
Zhong Meng, Hu Hu, Jinyu Li (opens in new tab), Changliang Liu, Yan Huang, Yifan Gong (opens in new tab), Chin-Hui Lee

SPE-P15: Speech Recognition: Adaptation
Acoustic Model Adaptation for Presentation Transcription and Intelligent Meeting Assistant Systems (opens in new tab)
Yan Huang, Yifan Gong (opens in new tab)

SPE-P15: Speech Recognition: Adaptation
Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation (opens in new tab)
Yan Huang, Lei He, Wenning Wei, William Gale, Jinyu Li (opens in new tab), Yifan Gong (opens in new tab)

SS-P1: Signal Processing Education: Trends and Innovations
A Dataset for Measuring Reading Levels in India at Scale (opens in new tab)
Dolly Agarwal, Jayant Gupchup, Nishant Baghel

17:30 – 17:30 CEST

IDSP-L2: Industry Session on Large-Scale Distributed Learning Strategies
Parallelizing Adam Optimizer with Blockwise Model-Update Filtering (opens in new tab)
Kai Chen, Haisong Ding, Qiang Huo

Friday, May 8

8:00 – 10:00 CEST

IFS-P1: Information Hiding, Biometrics and Security
Texception: A Character/Word-Level Deep Learning Model for Phishing URL Detection (opens in new tab)
Farid Tajaddodianfar, Jack W. Stokes (opens in new tab), Arun Gururajan

SAM-P6: Detection, Estimation and Classification
Static Visual Spatial Priors For DOA Estimation (opens in new tab)
Pawel Swietojanski, Ondrej Miksik

SPE-P16: Word Spotting
Adaptation of RNN Transducer with Text-to-Speech Technology for Keyword Spotting (opens in new tab)
Eva Sharma, Guoli Ye, Wenning Wei, Rui Zhao, Yao Tian, Jian Wu, Lei He, Ed Lin, Yifan Gong (opens in new tab)

SPE-P17: Speech Enhancement IV
AV(SE) ²: Audio-Visual Squeeze-Excite Speech Enhancement (opens in new tab)
Michael Iuzzolino, Kazuhito Koishida (opens in new tab)

8:20 – 8:40 CEST

HLT-L2: Language Modeling
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers (opens in new tab)
Junhao Xu, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Mei-Ling Meng

9:40 – 10:00 CEST

MLSP-L10: Deep Neural Network Structures
Neural Attentive Multiview Machines (opens in new tab)
Oren Barkan, Ori Katz, Noam Koenigstein

11:45 – 13:45 CEST

AUD-P11: Signal Enhancement and Restoration II
Geometrically Constrained Independent Vector Analysis for Directional Speech Enhancement (opens in new tab)
Li Li, Kazuhito Koishida (opens in new tab)

AUD-P11: Signal Enhancement and Restoration II
Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement (opens in new tab)
Yangyang Xia, Sebastian Braun, Chandan Reddy, Harishchandra Dubey, Ross Cutler, Ivan Tashev

HLT-P5: Multilingual Processing of Language
Addressing Accent Mismatch in Mandarin-English Code-Switching Speech Recognition (opens in new tab)
Zhili Tan, Xinghua Fan, Hui Zhu, Ed Lin

IFS-P2: Anonymization, Security and Privacy
Detection of Malicious VSCRIPT Using Static and Dynamic Analysis with Recurrent Deep Learning (opens in new tab)
Jack W. Stokes (opens in new tab), Rakshit Agrawal, Geoff McDonald

SPE-P19: Machine Learning for Speech Synthesis III
ESPNET-TTS: Unified, Reproducible, and Integartable Open Source End-to-End Text-to-Speech Toolkit (opens in new tab)
Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan (opens in new tab)

SPE-P20: Speech Recognition: Acoustic Modelling II
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model (opens in new tab)
Jinyu Li (opens in new tab), Rui Zhao, Eric Sun, Jeremy Wong, Amit Das, Zhong Meng, Yifan Gong (opens in new tab)

12:25 – 12:45 CEST

SPE-L16: Speaker Diarization
Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks (opens in new tab)
Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy (opens in new tab), Frank Rudzicz, Michael Brudno

13:05 – 13:25 CEST

SPE-L16: Speaker Diarization
A Memory Augmented Architecture for Continuous Speaker Identification in Meetings (opens in new tab)
Nikolaos Flemotomos, Dimitrios Dimitriadis (opens in new tab)

15:15 – 17:15 CEST

SPE-P21: Voice Conversion
An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data (opens in new tab)
Feng-Long Xie, Xin-Hui Li, Bo Liu, Yi-Bin Zheng, Li Meng, Li Lu, Frank K. Soong (opens in new tab)

16:15 – 16:30 CEST

MLSP-L11: Attention Needs
Attentive Item2vec: Neural Attentive User Representations (opens in new tab)
Oren Barkan, Avi Caciularu, Ori Katz, Noam Koenigstein

Microsoft @ ICASSP 2020

Tuesday, May 5

11:30 – 13:30 CEST

11:50 – 12:10 CEST

16:30 – 18:30 CEST

17:30 – 17:50 CEST

Wednesday, May 6

9:00 – 11:00 CEST

9:20 – 9:40 CEST

11:30 – 13:30 CEST

16:30 – 16:50 CEST

16:30 – 18:30 CEST

Thursday, May 7

9:00 – 11:00 CEST

10:20 – 10:40 CEST

10:40 – 11:00 CEST

11:30 – 13:30 CEST

12:10 – 12:30 CEST

16:30 – 18:30 CEST

17:30 – 17:30 CEST

Friday, May 8

8:00 – 10:00 CEST

8:20 – 8:40 CEST

9:40 – 10:00 CEST

11:45 – 13:45 CEST

12:25 – 12:45 CEST

13:05 – 13:25 CEST

15:15 – 17:15 CEST

16:15 – 16:30 CEST