Optimising Turn-Taking Strategies With Reinforcement Learning

Hatim Khouzaimi; Romain Laroche; Fabrice Lefevre

Optimising Turn-Taking Strategies With Reinforcement Learning

Hatim Khouzaimi ,
Romain Laroche ,
Fabrice Lefevre

Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) | August 2015

Download BibTex

In this paper, reinforcement learning (RL) is used to learn an efficient turn-taking management model in a simulated slot-filling task with the objective of minimising the dialogue duration and maximising the completion task ratio. Turn-taking decisions are handled in a separate new module, the Scheduler. Unlike most dialogue systems, a dialogue turn is split into micro-turns and the Scheduler makes a decision for each one of them. A Fitted Value Iteration algorithm, Fitted-Q, with a linear state representation is used for learning the state to action policy. Comparison between a non-incremental and an incremental handcrafted strategies, taken as baselines, and an incremental RL-based strategy, shows the latter to be significantly more efficient, especially in noisy environments.