FastSpeech: New text-to-speech model improves on speed, accuracy, and controllability
Text to speech (TTS) has attracted a lot of attention recently due to advancements in deep learning. Neural network-based TTS models (such as Tacotron 2, DeepVoice 3 and Transformer TTS) have outperformed conventional concatenative and statistical parametric approaches in terms of speech quality. Neural network-based TTS models usually first generate a mel-scale spectrogram (or mel-spectrogram) autoregressively from text input and then synthesize speech from the mel-spectrogram using a vocoder. (Note: the Mel scale is used…
décembre 2019
Blog de recherche Microsoft