{"id":1132209,"date":"2025-02-25T12:34:50","date_gmt":"2025-02-25T20:34:50","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-video&p=1132209"},"modified":"2025-04-07T10:22:14","modified_gmt":"2025-04-07T17:22:14","slug":"magma-a-foundation-model-for-multimodal-ai-agents-microsoft-research-forum","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/magma-a-foundation-model-for-multimodal-ai-agents-microsoft-research-forum\/","title":{"rendered":"Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum"},"content":{"rendered":"\n
\"Jianwei<\/figure>
\n

Presented by\u00a0Jianwei Yang<\/a>\u00a0at<\/em>\u00a0Microsoft Research Forum, Episode 5<\/strong><\/em><\/p>\n\n\n\n

Jianwei Yang, Principal Researcher, Microsoft Research Redmond, introduces Magma, a new multimodal agentic foundation model designed for UI navigation in digital environments and robotics manipulation in physical settings. It covers two new techniques, Set-of-Mark and Trace-of-Mark, for action grounding and planning, and details the unified pretraining pipeline that learns agentic capabilities.<\/p>\n<\/div><\/div>\n\n\n\n

\n
Register for the series<\/a><\/div>\n\n\n\n
Other Episode 5 talks<\/a><\/div>\n\n\n\n
All previous talks<\/a><\/div>\n<\/div>\n\n\n\n
\n\t