{"id":678390,"date":"2020-07-23T17:00:06","date_gmt":"2020-07-24T00:00:06","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=678390"},"modified":"2024-09-09T08:34:50","modified_gmt":"2024-09-09T15:34:50","slug":"deepspeed","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/deepspeed\/","title":{"rendered":"DeepSpeed"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"DeepSpeed\"\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

DeepSpeed<\/h1>\n\n\n\n

Extreme Speed and Scale for DL Training and Inference<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

\"deepspeed<\/figure>\n\n\n\n
DeepSpeed is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for DL Training and Inference. Visit us at deepspeed.ai (opens in new tab)<\/span><\/a> or our Github repo (opens in new tab)<\/span><\/a>.<\/h5>\n\n\n\n
<\/div>\n\n\n\n
\"DS-Training<\/td>

Reshape Large Model Training Landscape<\/h3><\/td><\/tr>

<\/td>DeepSpeed offers a confluence of system innovations, that has made large scale DL training effective, and efficient, greatly improved ease of use, and redefined the DL training landscape in terms of scale that is possible. These innovations such as ZeRO, 3D-Parallelism, DeepSpeed-MoE, ZeRO-Infinity, etc. fall under the DeepSpeed-Training pillar. Learn more >> (opens in new tab)<\/span><\/a><\/td><\/tr><\/tbody><\/table>\n\n\n\n
\"DS-Training<\/td>

Optimize Large Model Inference<\/h3><\/td><\/tr>

<\/td>DeepSpeed brings together innovations in parallelism technology such as tensor, pipeline, expert and ZeRO-parallelism, and combines them with high performance custom inference kernels, communication optimizations and heterogeneous memory technologies to enable inference at an unprecedented scale, while achieving unparalleled latency, throughput and cost reduction. This systematic composition of system technologies for inference falls under the DeepSpeed-Inference. Learn more >> (opens in new tab)<\/span><\/a><\/td><\/tr><\/tbody><\/table>\n\n\n\n
\"DS-Training<\/td>

Speed Up Inference & Reduce Model Size<\/h3><\/td><\/tr>

<\/td>To further increase the inference efficiency, DeepSpeed offers easy-to-use and flexible-to-compose compression techniques for researchers and practitioners to compress their models while delivering faster speed, smaller model size, and significantly reduced compression cost. Moreover, SoTA innovations on compression like ZeroQuant and XTC are included under the DeepSpeed-Compression pillar. Learn more >> (opens in new tab)<\/span><\/a>\n<\/td><\/tr><\/tbody><\/table>\n\n\n\n
<\/div>\n\n\n\n\n\n

DeepSpeed Model Implementations for Inference (MII)<\/h2>\n\n\n\n

Instant speedup on 24,000+ open-source DL models with up to 40x cheaper inference.<\/h2>\n\n\n\n
\"DeepSpeed-MII<\/figure>\n\n\n\n

The Deep Learning (DL) open-source community has seen tremendous growth in the last few months. Incredibly powerful text generation models such as the Bloom 176B, or image generation models such as Stable Diffusion are now available to anyone with access to a handful or even a single GPU through platforms such as Hugging Face. While open-sourcing has democratized access to AI capabilities, their application is still restricted by two critical factors: 1) inference latency and 2) cost.<\/p>\n\n\n\n

There has been significant progress in system optimizations for DL model inference that can drastically reduce both latency and cost, but those are not easily accessible. The main reason for this limited accessibility is that the DL model inference landscape is diverse with models varying in size, architecture, system performance characteristics, hardware requirements, etc. Identifying the appropriate set of system optimizations applicable to a given model and applying them correctly is often beyond the scope of most data scientists, making low latency and low-cost inference mostly inaccessible.<\/p>\n\n\n\n

DeepSpeed-MII (opens in new tab)<\/span><\/a> is a new open-source python library from DeepSpeed, aimed towards making low-latency, low-cost inference of powerful models not only feasible but also easily accessible.<\/p>\n\n\n\n