Training Dialogue Systems With Human Advice

Merwan Barlier; Romain Laroche; Olivier Pietquin

Training Dialogue Systems With Human Advice

Merwan Barlier ,
Romain Laroche ,
Olivier Pietquin

Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) | July 2018

One major drawback of Reinforcement Learning (RL) Spoken Dialogue Systems is that they inherit from the general exploration requirements of RL which makes them hard to deploy from an industry perspective. On the other hand, industrial systems rely on human expertise and hand written rules so as to avoid irrelevant behavior to happen and maintain acceptable experience from the user point of view. In this paper, we attempt to bridge the gap between those two worlds by providing an easy way to incorporate all kinds of human expertise in the training phase of a Reinforcement Learning Dialogue System. Our approach, based on the TAMER framework, enables safe and efficient policy learning by combining the traditional Reinforcement Learning reward signal with an additional reward, encoding expert advice. Experimental results show that our method leads to substantial improvements over more traditional Reinforcement Learning methods.