Temporal Supervised Learning for Inferring a Dialog Policy from Example Conversations

  • Lihong Li ,
  • He He ,
  • Jason Williams

Proceedings IEEE Spoken Language Technology Workshop (SLT) |

Published by IEEE - Institute of Electrical and Electronics Engineers

To appear in the 2014 IEEE Spoken Language Technology Workshop.

This paper tackles the problem of learning a dialog policy from example dialogs – for example, from Wizard-of-Oz style dialogs, where an expert (person) plays the role of the system. Learning in this setting is challenging because dialog is a temporal process in which actions affect the future course of the conversation – i.e., dialog requires planning. Past work solved this problem with either conventional supervised learning or reinforcement learning. Reinforcement learning provides a principled approach to planning, but requires more resources than a fixed corpus of examples, such as a dialog simulator or a reward function. Conventional supervised learning, by contrast, operates directly from example dialogs but does not take proper account of planning. We introduce a new algorithm called Temporal Supervised Learning which learns directly from example dialogs, while also taking proper account of planning. The key idea is to choose the next dialog action to maximize the expected discounted accuracy until the end of the dialog. On a dialog testbed in the calendar domain, in simulation, we show that a dialog manager trained with temporal supervised learning substantially outperforms a baseline trained using conventional supervised learning.