CoSSAT: Code-Switched Speech Annotation Tool
- Sanket Shah ,
- Pratik Joshi ,
- Sunayana Sitaram (susitara) ,
- Sebastin Santy
EMNLP AnnoNLP workshop |
Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.