Situated Interaction overview

Chapter 3, in The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions

Published by Association for Computing Machinery and Morgan & Claypool | 2019

ISBN: 978-1-970001-75-4

Interacting with computers via natural language is an enduring aspiration in artificial intelligence. The earliest attempts at dialog between computers and people were text-based dialog systems, such as Eliza [Weizenbaum 1966], a pattern-matching chat-bot that emulated a psychotherapist, and SHRLDU [Winograd 1971], a natural language understanding system that allowed interactions with a blocks world. Over the years, progress in large vocabulary speech recognition, natural language understanding, and speech synthesis have enabled the addition of speech capabilities. Attention shifted toward nurturing a new class of task-oriented spoken dialog systems, in which usere interact with a system via spoken language over multiple turns to accomplish specific tasks. Examples include the ATIS [Pallett et al. 1992] and Communicator projects [Walker et al. 2002], which aimed to provide information and make reservations for flights and hotels, the TRIPS system [Ferguson and Allen 1999] which explored collaborative planning tasks in various logistics domains, and CALO [Tur et al. 2010], which aimed to construct a personal assistant. The accumulated body of research and technical progress in task-oriented spoken dialog systems led to the personal assistants that are embedded in mobile phones, such as Siri, Google Assistant, and Cortana. More recently, these technologies have been deployed into fixed, standalone devices and smart speaker platforms running Alexa, Cortana, and Google Home.

The Assistant: Situated Interaction Project (2012)

The Assistant was a long-running AI system developed as part of the Situated Interaction project (opens in new tab) at Microsoft Research. Designed to function as a working administrative assistant, it was stationed outside the office of Eric Horvitz—then Lab Director at Microsoft Research Redmond. This video showcases the Assistant in action, highlighting its capabilities across a variety of scenarios. You can also see the system operate “in the wild” in this TED talk. (opens in new tab)

The Assistant served as an exploratory AI research testbed, blending multiple strands of AI into a unified, real-world application. Built to operate in the dynamic environment of a research lab, the Assistant helped coordinate meetings with Eric and briefed him on missed events upon his return. It was capable of engaging in multiparty dialogue, drawing on natural language processing, machine vision, speech recognition, and acoustical sensing. The Assistant project was co-led by Dan Bohus and Eric Horvitz, with significant contributions from Anne Loomis Thompson, Paul Koch, Tomislav Pejsa, Michael Cohen, James Mahoney.

The Assistant was a descendant of the earlier Receptionist project, a research effort on multiparty dialog capabilities. The project took an “integrative AI” approach—bringing together a constellation of technologies to create a cohesive, intelligent agent with the intuitions of a long-term administrative assistant. The Assistant leveraged several specialized systems that had previously been developed as standalone research efforts, including:

  • Coordinate – Uses machine learning to predict someone’s presence and availability, including forecasts of return times and when they would next check email. System considered predictions of meetings someone was likely to skip, allowing others to “pencil in” meetings accordingly.
  • BusyBody – Assesses the cost of interrupting someone based on contextual information such as desktop activity, conversation, and location. Busybody was part of longer-term studies on the use of machine learning about the cost of interruption and recovery from disruptions
  • Jogger – Use of machine learning to predict likelihood of forgetting information that would be valuable in a setting.
  • Priorities – Ranks unread emails by estimating the cost of delayed review.
  • Models for multiparty engagement – Enables systems to recognize and support dialog with multiple people in a joint conversation.
  • Multichannel grounding – Considers uncertainty at multiple levels, including vision, speech recognition, natural language understanding, and core assistant domain and provided linguistic and gestural cues about uncertainty aimed at resolution.

The Assistant operated for several years, acting as an auxiliary aide until Eric transitioned to a new role as Director of Microsoft Research and moved to a different office.