Yang Wang

Researcher

About

Yang Wang, 王阳 is a Researcher in System and Networking Area of Microsoft Research Asia (MSRA).

Research interests include:

system optimization based on the hardware characteristics
neural network inference optimization
computer architecture related topics

Recent Topic: Vector Quantization/Lookup Tables for Neural Network Inference

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models (2024)

VPTQ is a cutting-edge approach to extremely low-bit weight-only quantization for Large Language Models, utilizing Vector Quantization. VPTQ significantly reduces model bitwidth to <2bits, memory requirements, and enhances inference throughput by up to 1.8× compared to existing methods, achieving substantial accuracy improvements across various models.

https://github.com/microsoft/VPTQ

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization (LUT-NN on Commodity PIM, 2024)

PIM-DL introduces a revolutionary framework that replaces compute-heavy GEMM operations with Lookup-Tables, making deep learning acceleration on DRAM-PIMs efficient. This system achieves a remarkable 22.6× to 37.1× speedup over traditional GEMM-based inference on DRAM-PIMs and demonstrates up to 3.54× speedup compared to CPU/GPU-based solutions.

https://dl.acm.org/doi/10.1145/3620665.3640376

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup (LUT-NN on CPU, 2023)

LUT-NN is the first system to utilize table lookup for end-to-end DNN inference, significantly reducing computational costs through innovative centroid learning and optimized memory access. It delivers performance improvements of up to 92% in accuracy and reduces various operational costs, offering a competitive alternative to traditional computation-heavy models.

https://dl.acm.org/doi/10.1145/3570361.3613285