The Interactive Multimodal AI Systems focuses on creating interactive systems and experiences that blend the richness and complexity of people and their real, physical world with advanced technology. We seek to leverage multimodal generative AI models that incorporate multiple sensing modalities such as video, and speech, as well as models of spatial reasoning, human behavior and affect.
Our work is driven by enabling interactions through various modalities including but not limited to vision and language such as touch, gestures, speech, sound, gaze, smell, and other physiological signals. Beyond sensing, we consider display technologies such as computer graphics, audio, and augmented reality as integral to delivering the next great computing experience. These systems span a wide range of devices, including handheld, stationary, head-mounted, on-body, and midair, and are explored across diverse environments from 2D screens to 3D immersive spaces. We recognize the value of building and evaluating systems to validate our basic research and reveal new avenues of innovation.