{"id":658659,"date":"2020-05-19T08:00:19","date_gmt":"2020-05-19T15:00:19","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=658659"},"modified":"2020-06-08T13:24:24","modified_gmt":"2020-06-08T20:24:24","slug":"zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale\/","title":{"rendered":"ZeRO-2 & DeepSpeed: Shattering barriers of deep learning speed & scale"},"content":{"rendered":"

\"\"<\/p>\n

In February, we announced DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability. DeepSpeed has enabled researchers to create Turing Natural Language Generation (Turing-NLG (opens in new tab)<\/span><\/a>), the largest publicly known language model at 17 billion parameters. From there, we have been continuing to innovate at a fast rate, pushing the boundaries of speed and scale for deep learning training. Today, we are happy to share our new findings and results as we introduce the improved ZeRO-2 and further developments with DeepSpeed:<\/p>\n