DeepSpeed

Extreme Speed and Scale for DL Training and Inference

Microsoft Research blog

Microsoft Research Blog

Research Focus: Week of October 28, 2024

November 1, 2024

New Research | FLASH: Workflow automation agent for diagnosing recurring incidents; METAREFLECTION: Learning instructions for language agents using past reflections; Boosting LLM training efficiency through faster communication between GPUs; and more.

Microsoft Research Blog

Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries

December 22, 2023

AI saw unparalleled growth in 2023, reaching millions daily. This progress owes much to the extensive work of Microsoft researchers and collaborators. In this review, learn about the advances in 2023, which set the stage for further progress in 2024.

DeepSpeed4Science Initiative - graphic with 6 icons

Microsoft Research Blog

Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies

September 19, 2023 | Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Martin Cai, and Yuxiong He

Editor’s note, Sept. 28, 2023 – The founding collaborators list was updated to correct omissions and the scientific foundation model graph was updated to correct information. In the next decade, deep learning may revolutionize the natural sciences, enhancing our capacity to…

Microsoft Research Blog

DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication

June 22, 2023 | DeepSpeed Team and Andrey Proskurin

Large AI models are transforming the digital world. Generative language models like Turing-NLG, ChatGPT, and GPT-4, powered by large language models (LLMs), are incredibly versatile, capable of performing tasks like summarization, coding, and translation. Similarly, large multimodal generative models like…

Microsoft Research Focus 03: Week of November 7th, 2022

Microsoft Research Blog

Research Focus: Week of November 7, 2022

November 8, 2022

Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei,…

Microsoft Research Blog

DeepSpeed Compression: A composable library for extreme compression and zero-cost quantization

July 20, 2022 | DeepSpeed Team and Andrey Proskurin

Large-scale models are revolutionizing deep learning and AI research, driving major improvements in language understanding, generating creative texts, multi-lingual translation and many more. But despite their remarkable capabilities, the models’ large size creates latency and cost constraints that hinder the…

Microsoft Research Blog

DeepSpeed: Advancing MoE inference and training to power next-generation AI scale

January 19, 2022 | DeepSpeed Team and Andrey Proskurin

In the last three years, the largest trained dense models have increased in size by over 1,000 times, from a few hundred million parameters to over 500 billion parameters in Megatron-Turing NLG 530B (MT-NLG). Improvements in model quality with size…

Microsoft Research Blog

Research at Microsoft 2021: Collaborating for real-world change

December 15, 2021

Over the past 30 years, Microsoft Research has undergone a shift in how it approaches innovation, broadening its mission to include not only advancing the state of computing but also using technology to tackle some of the world’s most pressing…

A line graph comparing the end-to-end performance of Meta’s MoE language model using Azure NDm A100 v4 VMs with and without Tutel. The x-axis is the number of A100 (80GB) GPUs, beginning at 8 and going up to 512, and the y-axis is the throughput (K tokens/s), beginning with 0 and going up to 1,000 in intervals of 100. Tutel always achieves higher throughput than fairseq.

Microsoft Research Blog

Tutel: An efficient mixture-of-experts implementation for large DNN model training

November 22, 2021 | Wei Cui, Yifan Xiong, Peng Cheng, and Rafael Salas

Mixture of experts (MoE) is a deep learning model architecture in which computational cost is sublinear to the number of parameters, making scaling easier. Nowadays, MoE is the only approach demonstrated to scale deep learning models to trillion-plus parameters, paving…