Photo of the Brighton (UK) pier at night reflected in the water

May 12, 2019 – May 17, 2019

Microsoft @ ICASSP 2019

Location: Brighton, United Kingdom

A Pitch-Aware Approach to Single-Channel Speech Separation
Tuesday, May 14, 2019 | 1:30 PM–3:30 PM | Music Source Separation and Spatial Audio | Poster Area E

Ke Wang, Frank Soong, Lei Xie

A Sparsity Measure for Echo Density Growth in General Environments
Tuesday, May 14, 2019 | 1:30 PM–3:30 PM | Acoustic Environments and Music Analysis | Poster Area D

Helena Peic Tukuljac, Ville Pulkki, Hannes Gamper, Keith Godin, Ivan Tashev, Nikunj Raghuvanshi

Blind Room Volume Estimation from Single-Channel Noisy Speech
Tuesday, May 14, 2019 | 1:30 PM–3:30 PM | Acoustic Environments and Music Analysis | Poster Area D

Andrea Genovese, Hannes Gamper, Ville Pulkki, Nikunj Raghuvanshi, Ivan Tashev

Improving Binaural Ambisonics Decoding by Spherical Harmonics Domain Tapering and Coloration Compensation
Tuesday, May 14, 2019 | 1:30 PM–3:30 PM | Music Source Separation and Spatial Audio | Poster Area E

Christoph Hold, Hannes Gamper, Ville Pulkki, Nikunj Raghuvanshi, Ivan Tashev

Static and Dynamic State Predictions for Acoustic Model Combination
Tuesday, May 14, 2019 | 1:30 PM–3:30 PM | Deep Learning Applications I | Auditorium 2

Kshitiz Kumar, Yifan Gong

Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition
Tuesday, May 14, 2019 | 5:30 PM–7:30 PM | Language Modeling, ASR and Punctuation Prediction | Poster Area C

Max W.Y. Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng

Investigation of Sampling Techniques for Maximum Entropy Language Modeling Training
Tuesday, May 14, 2019 | 5:30 PM–7:30 PM | Language Modeling, ASR and Punctuation Prediction | Poster Area C

Xie Chen, Jun Zhang, Tasos Anastasakos, Fil Alleva

Recurrent Neural Network Language Model Training Using Natural Gradient
Tuesday, May 14, 2019 | 5:30 PM–7:30 PM | Language Modeling, ASR and Punctuation Prediction | Poster Area C

Jianwei Yu, Max W.Y. Lam, Xie Chen, Shoukang Hu, Songxiang Liu, Xixin Wu, Xunying Liu, Helen Meng

Towards Code-Switching ASR for End-to-End CTC Models
Tuesday, May 14, 2019 | 5:30 PM–7:30 PM | Multi-lingual Speech Recognition | Poster Area A

Ke Li, Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong

Adversarial Speaker Verification
Wednesday, May 15, 2019 | 8:30 AM–10:30 AM | Features and Robustness for Speaker Identification | Poster Area B

Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong

Attention in Recurrent Neural Networks for Ransomware Detection
Wednesday, May 15, 2019 | 8:30 AM–10:30 AM | Deep Learning III | Poster Area G

Rakshit Agrawal, Jack W. Stokes, Karthik Selvaraj, Mady Marinescu

Encrypted Speech Recognition Using Deep Polynomial Networks
Wednesday, May 15, 2019 | 8:30 AM–10:30 AM | Novel Architectures and Training Strategies for ASR | Auditorium 1

Shixiong Zhang, Yifan Gong, Dong Yu

Single-Channel Speech Extraction Using Speaker Inventory and Attention Network
Wednesday, May 15, 2019 | 8:30 AM–10:30 AM | Source Separation and Speech Enhancement I | Meeting Room 1

Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, Dimitrios Dimitriadis, Jasha Droppo, Yifan Gong

Universal Acoustic Modeling Using Neural Mixture Models
Wednesday, May 15, 2019 | 8:30 AM–10:30 AM | Novel Architectures and Training Strategies for ASR | Auditorium 1

Amit Das, Jinyu Li, Changliang Liu, Yifan Gong

Adversarial Speaker Adaptation
Wednesday, May 15, 2019 | 1:30 PM–3:30 PM | Feature Learning and Adaptation for ASR | Auditorium 1

Zhong Meng, Jinyu Li, Yifan Gong

Detecting Cyber Attacks Using Anomaly Detection with Explanations and Expert Feedback
Wednesday, May 15, 2019 | 1:30 PM–3:30 PM | Learning Theory and Methods I | Auditorium 2

Md Amran Siddiqui, Jack W. Stokes, Christian Seifert, Evan Argyle, Robert McCann, Joshua Neil, Justin Carroll

Directional Interference Suppression Using a Spatial Relative Transfer Function Feature
Wednesday, May 15, 2019 | 4:00 PM–6:00 PM | Quality Measures and Sensor Array Processing | Poster Area D

Sebastian Braun, Ivan Tashev

NN-Based Ordinal Regression for Assessing Fluency of ESL Speech
Wednesday, May 15, 2019 | 4:00 PM–6:00 PM | Training Regimes for Emotion and Sentiment Analysis | Poster Area C

Shaoguang Mao, Zhiyong Wu, Jingshuai Jiang, Peiyun Liu, Frank Soong

Non-Intrusive Speech Quality Assessment Using Neural Networks
Wednesday, May 15, 2019 | 4:00 PM–6:00 PM | Quality Measures and Sensor Array Processing | Poster Area D

Anderson R. Avila, Hannes Gamper, Chandan Reddy, Ross Cutler, Ivan Tashev, Johannes Gehrke

Conditional Teacher-Student Learning
Thursday, May 16, 2019 | 8:00 AM–10:00 AM | ASR Training Strategies and Toolkits | Poster Area A

Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong

Decoding Homomorphically Encrypted Flac Audio Without Decryption
Thursday, May 16, 2019 | 8:00 AM–10:00 AM | Audio Security and Source Separation | Poster Area D

Yuanyuan Tang, Bin Zhu, Xiaojing Ma, Mathiopoulos P. Takis, Xia Xie, Hong Huang

Improving Layer Trajectory LSTM with Future Context Frames
Thursday, May 16, 2019 | 1:00 PM–3:00 PM | New Features, Models and Representations/Audio Visual ASR | Poster Area A

Jinyu Li, Liang Lu, Changliang Liu, Yifan Gong

Contextual Out-of-Domain Utterance Handling with Counterfeit Data Augmentation
Thursday, May 16, 2019 | 3:30 PM–5:30 PM | Dialogue | Syndicate 1

Sungjin Lee, Igor Shalyminov

Dilated Residual Network with Multi-Head Self-Attention for Speech Emotion Recognition
Thursday, May 16, 2019 | 3:30 PM–5:30 PM | Architectures for Emotion and Sentiment Analysis | Poster Area B

Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, Helen Meng

Attentive Adversarial Learning for Domain-Invariant Training
Thursday, May 16, 2019 | 6:00 PM–8:00 PM | Robust Speech Recognition | Poster Area A

Zhong Meng, Jinyu Li, Yifan Gong

Speech Super Resolution Generative Adversarial Network
Thursday, May 16, 2019 | 6:00 PM–8:00 PM | Audio and Speech Applications | Poster Area G

Sefik Emre Eskimez, Kazuhito Koishida

Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier
Thursday, May 16, 2019 | 6:00 PM–8:00 PM | Signal Processing for Emerging and Practical Applications | Poster Area E

Session Chair: Ivan Tashev
Kshitiz Kumar, Tasos Anastasakos, Yifan Gong

Acoustic and Lexical Sentiment Analysis for Customer Service Calls
Friday, May 17, 2019 | 8:30 AM–10:30 AM | Using Multiple Perspectives in Emotion and Sentiment Analysis | Syndicate 3

Bryan Li, Dimitrios Dimitriadis, Andreas Stolcke

Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech
Friday, May 17, 2019 | 8:30 AM–10:30 AM | Artificial Intelligence Based Human-Machine Conversation Technology for Interactive Education | Syndicate 1

Session Chairs: Yao Qian, Helen Meng, Frank K. Soong
Jingyong Hou, Pengcheng Guo, Sining Sun, Frank K. Soong, Wenping Hu, Lei Xie

Learning Latent Representations for Style Control and Transfer in End-to-End Speech Synthesis
Friday, May 17, 2019 | 8:30 AM–10:30 AM | Speech Synthesis II | Poster Area B

Ya-Jie Zhang, Shifeng Pan, Lei He, Zhen-Hua Ling

Low-Latency Speaker-Independent Continuous Speech Separation
Friday, May 17, 2019 | 1:30 PM–3:30 PM | Speech Separation, Enhancement and Denoising | Poster Area A

Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis

Cross Modal Audio Search and Retrieval with Joint Embeddings Based on Text and Audio
Friday, May 17, 2019 | 4:00 PM–6:00 PM | Multimedia Analysis | Poster Area C

Benjamin Elizalde, Shuayb Zarar, Bhiksha Raj