{"id":1159224,"date":"2026-01-21T04:00:05","date_gmt":"2026-01-21T12:00:05","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-story&p=1159224"},"modified":"2026-01-26T14:48:36","modified_gmt":"2026-01-26T22:48:36","slug":"advancing-ai-for-the-physical-world","status":"publish","type":"msr-story","link":"https:\/\/www.microsoft.com\/en-us\/research\/story\/advancing-ai-for-the-physical-world\/","title":{"rendered":"Advancing AI for the physical world"},"content":{"rendered":"\n
\"Rho-Alpha<\/span>
\n
\n
<\/div>\n\n\n\n

Advancing AI for the physical world<\/h1>\n\n\n\n
<\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n
\n
\n\t\n\t
\n\t\t
\n\t\t\t
<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n
\n
<\/div>\n\n\n\n
\n
<\/div>\n\n\n\n

For decades, robots have excelled in structured settings like assembly lines, where tasks are predictable and tightly scripted.<\/h2>\n\n\n\n
\n

“The emergence of vision-language-action (VLA) models for physical systems is enabling systems to perceive, reason, and act with increasing autonomy alongside humans in environments that are far less structured.”<\/p>\n\u2013<\/em> Ashley Llorens, Corporate Vice President and Managing Director, Microsoft Research Accelerator<\/cite><\/blockquote>\n\n\n\n

\n
<\/div>\n<\/div>\n\n\n\n

Physical AI, where agentic AI meets physical systems, is poised to redefine robotics in the same way that generative models have transformed language and vision processing.<\/p>\n\n\n\n

Today, we are announcing Rho-alpha (\u03c1\u03b1<\/sub><\/em>), our first robotics model derived from Microsoft\u2019s Phi series (opens in new tab)<\/span><\/a> of vision-language models.<\/p>\n\n\n\n

We invite organizations interested in evaluating Rho-alpha for their robots and use cases to express interest in the Rho-alpha Research Early Access Program (opens in new tab)<\/span><\/a>. Rho-alpha will also be made available via Microsoft Foundry at a later date.<\/p>\n\n\n\n

Rho-alpha translates natural language commands into control signals for robotic systems performing bimanual manipulation tasks. It can be described as a VLA+ model in that it expands the set of perceptual and learning modalities beyond those typically used by VLAs. For perception, Rho-alpha adds tactile sensing, with efforts underway to accommodate modalities such as force. For learning, we are working toward enabling Rho-alpha to continually improve during deployment by learning from feedback provided by people.<\/p>\n\n\n\n

Through these advancements, we aim to make physical systems more easily adaptable, viewing adaptability as a hallmark of intelligence. We believe robots that can adapt more easily to dynamic situations and to human preferences will be more useful in the environments in which we live and work and more trusted by the people who deploy and operate them.<\/p>\n\n\n\n


\n\n\n\n
\n\t\n\t
\n\t\t
\n\t\t\t
\n
\n
Prompt: “Push the green button with the right gripper”<\/figcaption><\/figure>\n<\/div>\n\n\n\n
\n