Efficient DNN Search and Design in Vision

Established: October 1, 2019

Recent achievements by Deep Neural Networks (DNN) have brought a lot of opportunities and created possibilities for many real applications, e.g., image recognition, object detection and image generation. Despite many exciting news about breakthrough in research works, there are still grand challenges to transfer these state-of-the-art technologies into real products or services. One of the reasons lies in the fact that existing research gives little consideration to speed or computation time, and even less to constraints including power, energy, memory consumption and model size. This became obvious to many researchers when they work on real projects using DNN for efficient inference. The problem would be even more challenging in model training when we want to have DNN models across many devices with diverse resource constraints. For example, how do we alleviate the human labor of designing diverse DNN models tailored for multiple scenarios (e.g., on edge devices)? How do we reduce the carbon (opens in new tab) dioxide (opens in new tab) emission caused by AI model training to persist in the belief that AI for good? In this project, we are focusing on speed up deep learning inference by deep neural quantization, and search for efficient deep learning deployment by neural architecture search.

In our paper published in NeurIPS 2020, we present a novel neural architecture search framework, called “Cream of the Crop” which can automatically distill prioritized paths for one-shot neural architecture search. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned settings. In our current project “AutoFormer”, we propose a new one-shot architecture search training strategy, namely weight entanglement, dedicated to vision transformer search. The searched models surpass the recent state-of-the-art models such as Vit and DeiT. In particular, AutoFormer -tiny/small/base achieve 74.7%/81.7%/82.4% top-1 accuracy on ImageNet with 5.7M/22.9M/53.7M parameters, respectively. In the future, we will enhance architecture search with model compression and distillation, which is helpful for finding efficient and tiny models.

Our efficient AI project has already created fundamental vision technologies to support many Microsoft Internal teams. In particular, we have been collaborating with Project NNI (opens in new tab) to accelerate Microsoft vision product transfer.

diagram

Cream-AutoFormer

People

Portrait of Jianlong Fu

Jianlong Fu

Senior Research Manager