With Shared Microexponents, A Little Shifting Goes a Long Way

  • Bita Darvish Rouhani ,
  • Ritchie Zhao ,
  • Venmugil Elango ,
  • Rasoul Shafipour ,
  • Mathew Hall ,
  • ,
  • Ankit More ,
  • Levi Melnick ,
  • Maximilian Golub ,
  • Girish Varatkar ,
  • Lai Shao ,
  • Gaurav Kolhe ,
  • Dimitry Melts ,
  • Jasmine Klar ,
  • Renee L'Heureux ,
  • Matt Perry ,
  • ,
  • Eric Chung ,
  • Zhaoxia (Summer) Deng ,
  • Sam Naghshineh ,
  • Jongsoo Park ,
  • Maxim Naumov

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture |

This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.