{"id":478827,"date":"2018-04-13T15:11:14","date_gmt":"2018-04-13T22:11:14","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=478827"},"modified":"2018-04-13T15:39:18","modified_gmt":"2018-04-13T22:39:18","slug":"platform-situated-intelligence-tools-framework-multimodal-interaction-research","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/platform-situated-intelligence-tools-framework-multimodal-interaction-research\/","title":{"rendered":"Platform for Situated Intelligence: Tools and Framework for Multimodal Interaction Research"},"content":{"rendered":"

Over the last decade, advances in machine learning coupled with the availability of large amounts of data have led to significant progress on long-standing AI challenges. In domains like computer vision, speech recognition, machine translation and image captioning, machines have reached and sometimes even exceeded human performance levels on specific problem sets. However, building end-to-end, multimodal interactive systems that bring together multiple AI technologies and interact with people in the open world remains an important challenge.<\/p>\n

Challenges with Multimodal Interactive Systems<\/strong><\/h3>\n

Consider a robot that can escort visitors from one location to another in a building, all the while interacting with them via natural language. Or consider a meeting room virtual assistant that tries to understand the dynamics of the human interactions and provide assistance on demand. These types of systems require assembling and coordinating a diverse set of AI technologies: localization and mapping, person detection and tracking, attention tracking, speech recognition, sound source localization, speech source and addressee detection, natural language processing, dialog management, natural language generation and more.<\/p>\n

The sheer complexity of these systems creates significant engineering challenges that are further amplified by several unique attributes. These systems are highly multimodal; acting in the physical world requires that they process and fuse high density streams of data from multiple sensors. Multiple components need to process data in parallel, yet they also must be tightly coordinated to produce coherent internal states as well as timely results. Given the frequent use of machine learning and inference models, as well as interactivity needs, these systems often operate under uncertainty and under important latency constraints. Reasoning about time and uncertainty is therefore paramount. Unfortunately, these constructs are not yet core primitives in our programming languages and platforms.<\/p>\n

Additional software engineering challenges arise in the realm of debugging and maintenance. Visualization tools for multimodal temporal data can be an important accelerator but also are largely missing. Interesting challenges also arise from the fact that these systems often couple human-authored deterministic components with multiple machine learning models that often are chained. We know a lot about how to solve individual inference problems via machine learning but not so much about how to resolve the software engineering and maintenance problems arising from integrating multiple such models in end-to-end systems.<\/p>\n

\"\"<\/p>\n

At Microsoft Research, as part of the Situated Interaction (opens in new tab)<\/span><\/a> research effort and in other robotics research teams,\u00a0we have developed a host of physically situated, multimodal interactive systems, from robots that give directions, to embodied personal assistants, to elevators that recognize intentions to board versus walk by. In the process, we have experienced firsthand the challenges with building physically situated, multimodal interactive systems, and we have learned a number of important lessons. Given these challenges and the overhead involved, interesting research problems often are out of reach and remain unaddressed.<\/p>\n

We believe that the right set of primitives and tools can significantly lower the barrier to entry for developing multimodal interactive systems and enable more researchers to tackle problems that only become evident when an end-to-end system is present and deployed in the real world. Over the last several years we\u2019ve embarked on constructing a platform to help address these problems and we are now happy to announce the initial beta, open-source release of this framework, called Platform for Situated Intelligence (opens in new tab)<\/span><\/a>.<\/p>\n

Platform for Situated Intelligence<\/strong><\/h3>\n

Platform for Situated Intelligence is an open-source, extensible framework intended to enable the rapid development, fielding and study of situated, integrative AI systems.<\/p>\n

The term situated<\/em> refers to the fact that the framework primarily targets systems that sense and act in the physical world. This includes a broad class of applications, including various cyberphysical systems such as interactive robots, drones, embodied conversational agents, personal assistants, interactive instrumented meeting rooms, software systems that mesh human and machine intelligence and so on. Generally, any system that operates over streaming data and has low-latency constraints is a good candidate. The term integrative AI refers to the fact that the platform primarily targets systems that combine multiple, heterogeneous AI technologies and components.<\/p>\n

The platform provides an infrastructure, a set of tools and an ecosystem of reusable components that aim to mitigate some of the challenges that arise in the development of these systems. The primary goal is to speed up and simplify the development, debugging, analysis, maintenance and continuous evolution of integrative systems by empowering developer-in-the-loop scenarios and rapid iteration.<\/p>\n

The platform also aims to enable fundamental research and exploration into the science of integrative AI systems. Currently, these systems are typically constructed as a tapestry of heterogeneous technologies, which precludes studying and optimizing of the system as a whole. There are a number of interesting problems and opportunities in the space of integrative AI, which are very difficult to explore in real systems designed using current technologies. Platform for Situated Intelligence aims to provide the underlying set of abstractions that will enable meta-reasoning (or system-level reasoning) and foster research into this nascent science of integrative systems.<\/p>\n

Finally, Platform for Situated Intelligence is open and extensible. We have released it (opens in new tab)<\/span><\/a> as open-source because success ultimately depends on engaging the community and enabling contributions at all levels of the framework by creating a thriving ecosystem of reusable AI components.<\/p>\n

The following sections present a brief introduction to each of the three major areas of the framework:<\/p>\n