Project Overview
This project aims to build a deep learning compiler and optimizer infrastructure that can provide automatic scalability and efficiency optimization for distributed and local execution. Overall, this stack covers two types of general optimizations: fast distributed training over large-scale servers and efficient local execution on various hardware devices. Currently, our optimizations focus on many different parts of the system stack, such as fast distributed training over RDMA, automatic computation placement across devices, automatic operator batching and kernel fusion, tensor algebra compiler, sparse and quantization optimizations, and so on.
Open-source Release
Some of our projects have been open-sourced, and welcome to try, contribute and collaborate with us.
- NNFusion: https://github.com/microsoft/nnfusion (opens in new tab)
- A flexible and efficient DNN compiler that can generate high-performance executables from a DNN model description (e.g., TensorFlow frozen models and ONNX format).
- Antares: https://github.com/microsoft/antares (opens in new tab)
- An automatic engine for multi-platform kernel generation and optimization.
- And more to come…
Job Opportunity
- Research Intern [Link (opens in new tab)]
- FTE [Link (opens in new tab)]
Personne
Lingxiao Ma
Senior Researcher
Youshan Miao
Senior Researcher
Wenxiang Hu
Senior RSDE
Wei Cui
Senior Researcher
Fan Yang
Sr. Principal Research Manager
Lidong Zhou
Corporate Vice President, Chief Scientist of Microsoft Asia Pacific R&D Group, Managing Director of Microsoft Research Asia