Intelligent Cloud and Edge Group

NNFusion (opens in new tab): A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
Rammer (opens in new tab): A DNN compiler technology that can generate an efficient static spatio-temporal schedule for a DNN at compile time to minimize scheduling overhead.
Roller (opens in new tab): A fast and efficient tensor compiler for DNN that can generate efficient kernels in seconds with a construction-based approach.
SparTA (opens in new tab): an end-to-end optimization system to harvest the speeding up gain from the model sparsity.
Antares (opens in new tab): an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
Tutel (opens in new tab): An Optimized Mixture-of-Experts Implementation.