DeepSpeed

Extreme Speed and Scale for DL Training and Inference

Microsoft Research blog

simulated aerial drone navigating course

Microsoft Research Blog

Research Collection: Tools and Data to Advance the State of the Art

May 19, 2020

“This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing” —Sam Madden, Professor, Massachusetts Institute of Technology An open…

Microsoft Research Blog

Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries

December 22, 2023

AI saw unparalleled growth in 2023, reaching millions daily. This progress owes much to the extensive work of Microsoft researchers and collaborators. In this review, learn about the advances in 2023, which set the stage for further progress in 2024.

Microsoft Research Blog

DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication

June 22, 2023 | DeepSpeed Team and Andrey Proskurin

Large AI models are transforming the digital world. Generative language models like Turing-NLG, ChatGPT, and GPT-4, powered by large language models (LLMs), are incredibly versatile, capable of performing tasks like summarization, coding, and translation. Similarly, large multimodal generative models like…

Microsoft Research Blog

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

February 13, 2020 | DeepSpeed Team, Rangan Majumder, and Junhua Wang

The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly…

Microsoft Research Blog

DeepSpeed powers 8x larger MoE model training with high performance

August 18, 2021 | DeepSpeed Team and Z-code Team

Today, we are proud to announce DeepSpeed MoE, a high-performance system that supports massive scale mixture of experts (MoE) models as part of the DeepSpeed (opens in new tab) optimization library. MoE models are an emerging class of sparsely activated…

Microsoft Research Blog

DeepSpeed Compression: A composable library for extreme compression and zero-cost quantization

July 20, 2022 | DeepSpeed Team and Andrey Proskurin

Large-scale models are revolutionizing deep learning and AI research, driving major improvements in language understanding, generating creative texts, multi-lingual translation and many more. But despite their remarkable capabilities, the models’ large size creates latency and cost constraints that hinder the…

A line graph comparing the end-to-end performance of Meta’s MoE language model using Azure NDm A100 v4 VMs with and without Tutel. The x-axis is the number of A100 (80GB) GPUs, beginning at 8 and going up to 512, and the y-axis is the throughput (K tokens/s), beginning with 0 and going up to 1,000 in intervals of 100. Tutel always achieves higher throughput than fairseq.

Microsoft Research Blog

Tutel: An efficient mixture-of-experts implementation for large DNN model training

November 22, 2021 | Wei Cui, Yifan Xiong, Peng Cheng, and Rafael Salas

Mixture of experts (MoE) is a deep learning model architecture in which computational cost is sublinear to the number of parameters, making scaling easier. Nowadays, MoE is the only approach demonstrated to scale deep learning models to trillion-plus parameters, paving…