Deep Learning in Conversational Language Understanding

  • Gokhan Tur ,
  • Asli Celikyilmaz ,
  • Xiaodong He ,
  • Dilek Hakkani-Tur ,
  • Li Deng

in Deep Learning in Natural Language Processing. (eds. Li Deng and Yang Liu)

Published by Springer | 2017

In the last decade, a variety of practical goal-oriented conversation language understanding systems (CLU) have been built, especially as part of the virtual personal assistants such as Google Assistant, Amazon Alexa, Microsoft Cortana, or Apple Siri.

In contrast to speech recognition, which aims to automatically transcribe the sequence of spoken words, CLU is not a clearly defined task. At the  highest level, CLU’s goal is to extract “meaning” from natural language in the context of conversations, spoken or in text. In practice, this may mean any practical application allowing its users to perform some task with natural (optionally  spoken) language. In the literature, CLU is often used to denote the task of understanding natural language in spoken form in conversation or otherwise. So CLU discussed in this chapter and book is closely related to and sometimes synonymous with SLU in the literature.

Here we further elaborate on the connections among speech recognition, CLU/ SLU, and natural language understanding in text form. Speech recognition does not concern understanding, and is responsible only for converting language from spoken form to text form. Errors in speech recognition can be viewed as “noise” in downstream language processing systems. Handling this type of noisy NLP problems can be connected to the problem of noisy speech recognition where the “noise” comes from acoustic environments (as opposed to from recognition errors).

For SLU and CLU with spoken input, the inevitable errors in speech recognition would make understanding harder than when the input is text, free of speech recognition errors. In the long history of SLU/CLU research, the difficulties caused by speech recognition errors forced the domains of SLU/CLU to be substantially narrower than language understanding in text form \cite{SLUBook-chapter}. However, due to the recent huge success of deep learning in speech recognition, recognition errors have been dramatically reduced, leading to increasingly broader application domains in current CLU systems.

One category of conversational understanding tasks root in old artificial intelligence (AI) work, such as the MIT Eliza system built in 60s, mainly used for chit-chat systems, mimicking understanding. For example if the user says “{I am depressed}”,  Eliza would say “{are you depressed often?}”. The other extreme is building generic understanding capabilities, using deeper semantics and are demonstrated to be successful for very limited domains. These systems are typically heavily knowledge-based and rely on formal semantic interpretation defined as mapping sentences into their logical forms. In its simplest form, a logical form is a context independent representation of a sentence covering its predicates and arguments. For example, if the sentence is {John loves Mary}, the logical form would be { love(john,mary)}. following these ideas, some researchers worked towards building universal semantic grammars (or interlingua), which assume that all languages have a shared set of semantic features. Such interlingua based approaches also heavily influenced machine translation research until the late 90s, before statistical approaches began to dominate….

In this chapter we will review the state-of-the-art deep learning based CLU methods in detail, mainly focusing on these three tasks.