Recent Efforts Towards Efficient And Scalable Neural Waveform Coding
Acoustic signal compression techniques, converting the floating-point waveform into the bitstream representation, serve a cornerstone in the current data storage and telecommunication infrastructure. The rise of data-driven approaches for acoustic coding systems brings in not only potentials but also challenges, among which the model complexity is a major concern: on the one hand, this general-purpose computational paradigm features the performance superiority; on the other hand, most codecs are deployed on low power devices which barely afford the overwhelming computational overhead. In this talk, I will introduce several of our recent efforts towards a better trade-off between performance and efficiency for neural speech/audio coding. I will present on cascaded cross-module residual learning to conduct multistage quantization in deep learning techniques; in addition, a collaborative quantization scheme will be talked about to simultaneously binarize linear predictive coefficients and the corresponding residuals. If time permits, a novel perceptually salient objective function with a psychoacoustical calibration will also be discussed.
Speaker Details
Kai Zhen is a Ph.D. candidate (ABD), advised by Prof. Minje Kim, in Computer Science and Cognitive Science at Indiana University. He has been working on efficient and scalable neural waveform coding systems. He had two machine learning and relevance internships at LinkedIn in 2018 and 2019, trailed by an internship at Amazon Alexa in 2020.
- Series:
- Microsoft Research Talks
- Date:
- Speakers:
- Kai Zhen
- Affiliation:
- Indiana University
Series: Microsoft Research Talks
-
Decoding the Human Brain – A Neurosurgeon’s Experience
Speakers:- Pascal Zinn,
- Ivan Tashev
-
-
-
-
Galea: The Bridge Between Mixed Reality and Neurotechnology
Speakers:- Eva Esteban,
- Conor Russomanno
-
Current and Future Application of BCIs
Speakers:- Christoph Guger
-
Challenges in Evolving a Successful Database Product (SQL Server) to a Cloud Service (SQL Azure)
Speakers:- Hanuma Kodavalla,
- Phil Bernstein
-
Improving text prediction accuracy using neurophysiology
Speakers:- Sophia Mehdizadeh
-
-
DIABLo: a Deep Individual-Agnostic Binaural Localizer
Speakers:- Shoken Kaneko
-
-
Recent Efforts Towards Efficient And Scalable Neural Waveform Coding
Speakers:- Kai Zhen
-
-
Audio-based Toxic Language Detection
Speakers:- Midia Yousefi
-
-
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
Speakers:- Sujeeth Bharadwaj
-
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
Speakers:- Monojit Choudhury
-
-
-
-
-
'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
Speakers:- Peter Clark
-
Checkpointing the Un-checkpointable: the Split-Process Approach for MPI and Formal Verification
Speakers:- Gene Cooperman
-
Learning Structured Models for Safe Robot Control
Speakers:- Ashish Kapoor
-