About
I received the B.S. degree in computer science from ShanghaiTech University and Ph.D degree in the electronic and computer engineering from the Hong Kong University of Science and Technology. My research focuses on interpreting the mechanisms of foundation models, analyzing them mathematically, and applying these mechanisms to push the limits of foundation models. Our work consists of three main components:
Controlled Experiments: We design rigorous controlled experiments to train smaller models from scratch, aiming to discover universal laws that extend beyond the current foundation models.
Mechanistic Interpretations: We perform mechanistic interpretations of pretrained large models, transforming them into “grey boxes” to enhance our understanding of their inner workings.
- [2411.14982] Large Multi-modal Models Can Interpret Features in Large Multi-modal Models (opens in new tab)
- ICLR’25 | How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension (opens in new tab)
- NeurIPS’24 | Understanding and Improving Training-free Loss-based Diffusion Guidance (opens in new tab)
- ICLR’23 | SIMPLE: Specialized Model-Sample Matching for Domain Generalization (opens in new tab)
- ICLR’23 (Oral) | Sparse Mixture-of-Experts are Domain Generalizable Learners (opens in new tab)
Applications: We use the discovered laws and interpretations to develop novel training paradigms for foundation models. Currently, our applications include interdisciplinary collaborative projects in
- LLM reasoning:
- Embodied AI:
- AI for Science:
- AI for Communication|Networking|Sensing:
- IEEE COMMAG’24 | Large Language Models Empowered Autonomous Edge AI for Connected Intelligence (opens in new tab)
- IEEE JSTSP’23 | An Adaptive and Robust Deep Learning Framework for THz Ultra-Massive MIMO Channel Estimation (opens in new tab)
- IEEE JSAC’20 | Graph Neural Networks for Scalable Radio Resource Management: Architecture Design and Theoretical Analysis (opens in new tab)
In addition to my primary research, I have a strong interest in system-level work that enables LLM training on less powerful GPUs:
This complementary work aims to make LLMs accessible to more researchers and practitioners.