Pushing the frontier of neural text to speech

May 20, 2021
Xu Tan | Microsoft Research

In the popular field of text to speech, the goal is to transform the written or printed word into speech that is natural and intelligible. Today, the technology is being used in products and services to help people who are blind or have low vision consume digital content, power personal digital assistants that sound more realistic, and make it easier to do two things at once, such as listening to an article online while washing dishes, among other applications. Although the quality of synthesized speech has gotten better thanks to neural network-based end-to-end TTS, advancing neural TTS and allowing it to be more easily integrated into product development and deployment requires overcoming a variety of remaining challenges.

In this webinar, Senior Researcher Xu Tan will talk about these challenges, specifically the high computational cost and slow inference speed in online serving; word skipping and repeating issues, poor voice quality, and lack of voice controllability; the large amounts of training data needed for improved voice synthesis; and the practical challenges in TTS voice adaptation. He’ll introduce his team’s work addressing these challenges—including fast TTS, end-to-end TTS, low-resource TTS, and adaptive TTS—as well as discuss other critical questions and opportunities to pursue in the space.

Together, you’ll explore:

An overview of text to speech, including its evolution
The important challenges in neural text to speech and how to address them with dedicated research
How to factor product development into your research

Resource list:

Text to Speech (Project page)
Xu Tan (Publications page)
Speech Research Repository Master List (opens in new tab) (GitHub)
FastSpeech: Fast, Robust and Controllable Text to Speech (opens in new tab) (GitHub)
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech (opens in new tab) (GitHub)
AdaSpeech: Adaptive Text to Speech for Custom Voice (opens in new tab) (GitHub)
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data (opens in new tab) (GitHub)
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search (opens in new tab) (GitHub)
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition (opens in new tab) (Github)
Neural Text-to-Speech previews five new languages with innovative models in the low-resource setting (opens in new tab) (blog)
Microsoft Azure Text to Speech (opens in new tab)
Microsoft Azure Custom Voice (opens in new tab)
Xu Tan (Researcher profile)

*This on-demand webinar features a previously recorded Q&A session and open captioning.

This webinar originally aired on May 20, 2021

Explore more Microsoft Research webinars: https://aka.ms/msrwebinars (opens in new tab)

- Xu Tan
  
  Principal Research Manager
Research Area
Research Lab
- Microsoft Research Lab - Asia
Group
- Deep and Reinforcement Learning Group
Project
- Text to Speech

Watch Next

Convergence Analysis for Fast High-Order ODE Solvers in Diffusion Probabilistic Models
July 7, 2026
Zhengjiang Lin
Reinforce Adjoint Matching: Scaling Diffusion RL
June 30, 2026
Andreas Bergmeister
Panel: Is Retrieval Relevant in the Age of Reasoning?
June 9, 2026
Himanshu Tyagi,

Ravishankar Krishnaswamy,

Mrinal Kanti Das

, et. al.
Session on Reasoning
June 9, 2026
Hongxiang Fan,

Nagarajan Natarajan
Human-Centered AI: Design, Deployment & Healthcare
June 9, 2026
Manik Gupta,

Anirudha Joshi,

Aaditeshwar Seth

, et. al.
Plenary Talk 2: Reimagining Education and Skilling for the Age of AI: Challenges & Opportunities
June 9, 2026
Manohar Swaminathan
Session on Retrieval
June 9, 2026
Lokesh Nagalapatti,

Soumen Chakrabarti
Session on Inclusive AI: Data, Models, Evaluation
June 9, 2026
Niloy Ganguly,

Danish Pruthi,

Sunayana Sitaram

, et. al.
Plenary Talk 1: Navigating the AI Horizon: Promises, Perils, and the Power of Collaboration
June 9, 2026
Ece Kamar,

Srinivasan Iyengar
Welcome Session - Microsoft Research India Academic Summit 2026
June 9, 2026
Venkat Padmanabhan,

Srinivasan Iyengar

Your Privacy Choices