With Shared Microexponents, A Little Shifting Goes a Long Way
- Bita Darvish Rouhani ,
- Ritchie Zhao ,
- Venmugil Elango ,
- Rasoul Shafipour ,
- Mathew Hall ,
- Maral Mesmakhosroshahi ,
- Ankit More ,
- Levi Melnick ,
- Maximilian Golub ,
- Girish Varatkar ,
- Lai Shao ,
- Gaurav Kolhe ,
- Dimitry Melts ,
- Jasmine Klar ,
- Renee L'Heureux ,
- Matt Perry ,
- Doug Burger ,
- Eric Chung ,
- Zhaoxia (Summer) Deng ,
- Sam Naghshineh ,
- Jongsoo Park ,
- Maxim Naumov
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture |
This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.