From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks

Deep neural networks have been trained to interpret images and text at increasingly high levels of accuracy. In many cases, these accuracy improvements are the result of developing increasingly large and computationally-intensive neural network models. These models tend to incur high latency during inference, especially when deployed on smartphones and edge-devices. In this talk, we present two lines of work that focus on mitigating the high cost of neural network inference on edge-devices. First, we review the last four years of progress in the computer vision (CV) community towards developing efficient neural networks for edge-devices, ranging from early work such as SqueezeNet, to recent work leveraging neural architecture search. Second, we present SqueezeBERT, a mobile-optimized neural network design for natural language processing (NLP) that draws on ideas from efficient CV network design. SqueezeBERT achieves a 4.3x speedup over BERT-base on a Pixel 3 smartphone. Finally, we believe that SqueezeBERT is just the beginning of several years of fruitful research in the NLP community to develop efficient neural architectures.

Date:
Speakers:
Forrest Iandola, Sujeeth Bharadwaj
Affiliation:
University of California Berkeley, Microsoft Research

Series: Microsoft Research Talks