About
Yang Wang, 王阳 is a Researcher in System and Networking Area of Microsoft Research Asia (MSRA).
Research interests include:
- system optimization based on the hardware characteristics
- neural network inference optimization
- computer architecture related topics
Recent Topic: Vector Quantization/Lookup Tables for Neural Network Inference
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models (2024)
VPTQ is a cutting-edge approach to extremely low-bit weight-only quantization for Large Language Models, utilizing Vector Quantization. VPTQ significantly reduces model bitwidth to <2bits, memory requirements, and enhances inference throughput by up to 1.8× compared to existing methods, achieving substantial accuracy improvements across various models.
https://github.com/microsoft/VPTQ
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization (LUT-NN on Commodity PIM, 2024)
PIM-DL introduces a revolutionary framework that replaces compute-heavy GEMM operations with Lookup-Tables, making deep learning acceleration on DRAM-PIMs efficient. This system achieves a remarkable 22.6× to 37.1× speedup over traditional GEMM-based inference on DRAM-PIMs and demonstrates up to 3.54× speedup compared to CPU/GPU-based solutions.
https://dl.acm.org/doi/10.1145/3620665.3640376
LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup (LUT-NN on CPU, 2023)
LUT-NN is the first system to utilize table lookup for end-to-end DNN inference, significantly reducing computational costs through innovative centroid learning and optimized memory access. It delivers performance improvements of up to 92% in accuracy and reduces various operational costs, offering a competitive alternative to traditional computation-heavy models.