{"id":783490,"date":"2021-10-11T06:00:00","date_gmt":"2021-10-11T13:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=783490"},"modified":"2021-10-11T05:52:10","modified_gmt":"2021-10-11T12:52:10","slug":"using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model\/","title":{"rendered":"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World\u2019s Largest and Most Powerful Generative Language Model"},"content":{"rendered":"\n

We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration between Microsoft and NVIDIA to further parallelize and optimize the training of very large AI models.<\/p>\n\n\n\n

As the successor to Turing NLG 17B (opens in new tab)<\/span><\/a> and Megatron-LM (opens in new tab)<\/span><\/a>, MT-NLG has 3x the number of parameters compared to the existing largest model of this type and demonstrates unmatched accuracy in a broad set of natural language tasks such as:<\/p>\n\n\n\n