Recurrent Support Vector Machines for Speech Recognition
- Shi-Xiong Zhang ,
- Rui Zhao ,
- Chaojun Liu ,
- Jinyu Li ,
- Yifan Gong
ICASSP |
Recurrent Neural Networks (RNNs) using Long-Short Term Memory (LSTM) architecture have demonstrated the state-of-the-art performances on speech recognition. Most of deep RNNs use the softmax activation function in the last layer for classification. This paper illustrates small but consistent advantages of replacing the softmax layer in RNN with Support Vector Machines (SVMs). The parameters of RNNs and SVMs are jointly learned using a sequence-level maxmargin criteria, instead of cross-entropy. The resulting model is termed Recurrent SVM. The conventional SVMs need to predefine a feature space and do not have internal states to deal with arbitrary long-term dependencies in sequences. The proposed recurrent SVM uses LSTMs to learn the feature space and to capture temporal dependencies, while using the SVM (in the last layer) for sequence classification. The model is evaluated on the Windows phone task for large vocabulary continuous speech recognition.