GAN-based Data Generation for Speech Emotion Recognition

Sefik Emre Eskimez; Dimitrios Dimitriadis; Robert Gmyr; Kenichi Kumatani

GAN-based Data Generation for Speech Emotion Recognition

Sefik Emre Eskimez ,
Dimitrios Dimitriadis ,
Robert Gmyr ,
Kenichi Kumatani

INTERSPEECH 2020 | October 2020

In this work, we propose a GAN-based method to generate synthetic data for speech emotion recognition. Specifically, we investigate the usage of GANs for capturing the data manifold when the data is eyes-off, i.e., where we can train networks using the data but cannot copy it from the clients. We propose a CNN-based GAN with spectral normalization on both the generator and discriminator, both of which are pre-trained on large unlabeled speech corpora. We show that our method provides better speech emotion recognition performance than a strong baseline. Furthermore, we show that even after the data on the client is lost, our model can generate similar data that can be used for model bootstrapping in the future. Although we evaluated our method for speech emotion recognition, it can be applied to other tasks.