{"id":1162546,"date":"2026-03-03T10:05:18","date_gmt":"2026-03-03T18:05:18","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-video&p=1162546"},"modified":"2026-03-03T10:05:20","modified_gmt":"2026-03-03T18:05:20","slug":"aro-a-new-lens-on-matrix-optimization-for-llms","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/aro-a-new-lens-on-matrix-optimization-for-llms\/","title":{"rendered":"ARO: A new lens on matrix optimization for LLMs"},"content":{"rendered":"\n
\n

We present Adaptively Rotated Optimization (ARO), a matrix optimizer that speeds up LLM training by applying updates in a rotated, geometry-aware coordinate system. Guided by new insights on global structures on LLM loss landscapes, ARO treats rotation as a unifying principle for sample efficiency, and proposed a new update policy that is applicable to all model weight matrices. In large scale controlled experiments, ARO consistently outperforms AdamW and orthogonalization-based method, maintaining its gains as models and training budgets scale.<\/p>\n\n\n\n

Explore more<\/h2>\n\n\n\n