{"id":472845,"date":"2018-03-20T10:20:05","date_gmt":"2018-03-20T17:20:05","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=472845"},"modified":"2024-04-24T08:19:12","modified_gmt":"2024-04-24T15:19:12","slug":"fiddle","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/fiddle\/","title":{"rendered":"Project Fiddle"},"content":{"rendered":"

Project Fiddle: Fast and Efficient Infrastructure for Distributed Deep Learning<\/strong><\/p>\n

The goal of Project Fiddle is to build efficient systems infrastructure for very fast distributed DNN training. Our goal is to support 100x more efficient training. To achieve this goal, we take a broad view of training: from a single GPU, to multiple GPUs on a machine, all the way to training on large multi-machines clusters. Our innovations cut across the systems stack: the memory subsystem, structuring parallel computation across GPUs and machines, and interconnects between GPUs and across machines.<\/p>\n

Our work so far has targeted many different parts of the systems stack (organized as different sub-projects)<\/p>\n