Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum

Jianwei Yang, Principal Researcher, Microsoft Research Redmond, introduces Magma, a new multimodal agentic foundation model designed for UI navigation in digital environments and robotics manipulation in physical settings. It covers two new techniques, Set-of-Mark and Trace-of-Mark, for action grounding and planning, and details the unified pretraining pipeline that learns agentic capabilities.

This session aired on February 25, 2025, at Microsoft Research Forum, Episode 5.

Register for the series: https://aka.ms/registerresearchforumYTe5 (opens in new tab)

Continue watching episode 5: https://aka.ms/researchforumYTe5 (opens in new tab)
Explore all previous episodes: https://aka.ms/researchforumYTplaylist (opens in new tab)

Date:
Speakers:
Jianwei Yang
Affiliation:
Microsoft Research Redmond

Series: Microsoft Research Forum