{"id":951717,"date":"2023-06-29T09:00:00","date_gmt":"2023-06-29T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=951717"},"modified":"2023-06-29T09:06:24","modified_gmt":"2023-06-29T16:06:24","slug":"breaking-cross-modal-boundaries-in-multimodal-ai-introducing-codi-composable-diffusion-for-any-to-any-generation","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/breaking-cross-modal-boundaries-in-multimodal-ai-introducing-codi-composable-diffusion-for-any-to-any-generation\/","title":{"rendered":"Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation"},"content":{"rendered":"\n

Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension, seamlessly consolidate information from a wide range of sources, and enable strong immersion in human-AI interactions. This could transform the way humans interact with computers on various tasks, including assistive technology, custom learning tools, ambient computing, and content generation.<\/p>\n\n\n\n

\n\t