Return to Microsoft Research Lab – Redmond

Deep Learning Group

News & features

Articles

Magma: A foundation model for multimodal AI agents

February 25, 2025

Jianwei Yang introduces Magma, a new multimodal agentic foundation model designed for UI navigation in digital environments and robotics manipulation in physical settings.

$A stylized illustration of a green line-drawn hand holding a transparent prism with colorful bands of light being refracted through it against a black background.$

Microsoft Research Blog

BiomedParse: A foundation model for smarter, all-in-one biomedical image analysis

November 18, 2024 | Hoifung Poon, Theodore Zhao, Aiden Gu, Mu Wei, and Sheng Wang

BiomedParse reimagines medical image analysis, integrating advanced AI to capture complex insights across imaging types—a step forward for diagnostics and precision medicine.

Microsoft Research Blog

Research Focus: Week of February 19, 2024

February 21, 2024

In this issue: CaaSPER: vertical autoscaling algorithm dynamically maintains optimal CPU utilization; Improved scene landmark detection for camera localization runs faster, uses less storage; ESUS simplifies usability questionnaires for technical products and services.

Microsoft Research Blog

NeurIPS 2023 highlights breadth of Microsoft’s machine learning innovation

December 11, 2023

We’re proud to have 100+ accepted papers At NeurIPS 2023, plus 18 workshops. Several submissions were chosen as oral presentations and spotlight posters, reflecting groundbreaking concepts, methods, or applications. Here’s an overview of those submissions.

Responsible AI blog - hero graphic with connected circles with icons depicting closed captions, calendar, image, and document inside of the circles

Microsoft Research Blog

Frontiers of multimodal learning: A responsible AI approach

September 6, 2023

New evaluation methods and a commitment to continual improvement are musts if we’re to build multimodal AI systems that advance human goals. Learn about cutting-edge research into the responsible development and use of multimodal AI at Microsoft.

graphical user interface, text, application, Word, email

Articles

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

March 7, 2023

Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to real-world, mission-critical applications remains challenging mainly due to their tendency to generate…

Microsoft Research Focus 03: Week of November 7th, 2022

Microsoft Research Blog

Research Focus: Week of November 7, 2022

November 8, 2022

Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei,…

SoTA comparisons on COCO object detection

Articles

FocalNets: Focus Eyes with Focal Modulation

November 1, 2022

Human eyes have a dynamic focusing system that adjusts the focal regions in order to see the surroundings at all distances. When we look far away, up close, and back again, our eyes change focus rapidly to allow us to…

Articles

ECCV Workshop on “Computer Vision in the Wild”

September 8, 2022

Website: https://computer-vision-in-the-wild.github.io/eccv-2022/ (opens in new tab) Workshop: The research community has recently witnessed a trend in building transferable visual models that can effortlessly adapt to a wide range of downstream computer vision (CV) and multimodal (MM) tasks. We are organizing…