Models for Multiparty Engagement in Open-World Dialog

Proceedings of the SIGDIAL 2009 Conference, The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue |

We present computational models that allow spoken dialog systems to handle multi-participant engagement in open, dynamic environments, where multiple people may enter and leave conversations, and interact with the sys-tem and with others in a natural manner. The models for managing the engagement process include components for (1) sensing the engagement state, actions and intentions of multiple agents in the scene, (2) making engagement decisions (i.e. whom to engage with, and when) and (3) rendering these decisions in a set of coordinated low-level behaviors in an embodied conversational agent. We review results from a study of interactions “in the wild” with a system that implements such a model.

The Assistant: Situated Interaction Project (2012)

The Assistant was a long-running AI system developed as part of the Situated Interaction project (opens in new tab) at Microsoft Research. Designed to function as a working administrative assistant, it was stationed outside the office of Eric Horvitz—then Lab Director at Microsoft Research Redmond. This video showcases the Assistant in action, highlighting its capabilities across a variety of scenarios. You can also see the system operate “in the wild” in this TED talk. (opens in new tab)

The Assistant served as an exploratory AI research testbed, blending multiple strands of AI into a unified, real-world application. Built to operate in the dynamic environment of a research lab, the Assistant helped coordinate meetings with Eric and briefed him on missed events upon his return. It was capable of engaging in multiparty dialogue, drawing on natural language processing, machine vision, speech recognition, and acoustical sensing. The Assistant project was co-led by Dan Bohus and Eric Horvitz, with significant contributions from Anne Loomis Thompson, Paul Koch, Tomislav Pejsa, Michael Cohen, James Mahoney.

The Assistant was a descendant of the earlier Receptionist project, a research effort on multiparty dialog capabilities. The project took an “integrative AI” approach—bringing together a constellation of technologies to create a cohesive, intelligent agent with the intuitions of a long-term administrative assistant. The Assistant leveraged several specialized systems that had previously been developed as standalone research efforts, including:

  • Coordinate – Uses machine learning to predict someone’s presence and availability, including forecasts of return times and when they would next check email. System considered predictions of meetings someone was likely to skip, allowing others to “pencil in” meetings accordingly.
  • BusyBody – Assesses the cost of interrupting someone based on contextual information such as desktop activity, conversation, and location. Busybody was part of longer-term studies on the use of machine learning about the cost of interruption and recovery from disruptions
  • Jogger – Use of machine learning to predict likelihood of forgetting information that would be valuable in a setting.
  • Priorities – Ranks unread emails by estimating the cost of delayed review.
  • Models for multiparty engagement – Enables systems to recognize and support dialog with multiple people in a joint conversation.
  • Multichannel grounding – Considers uncertainty at multiple levels, including vision, speech recognition, natural language understanding, and core assistant domain and provided linguistic and gestural cues about uncertainty aimed at resolution.

The Assistant operated for several years, acting as an auxiliary aide until Eric transitioned to a new role as Director of Microsoft Research and moved to a different office.