Yifei Shen

Researcher

About

I received the B.S. degree in computer science from ShanghaiTech University and Ph.D degree in the electronic and computer engineering from the Hong Kong University of Science and Technology. My research focuses on interpreting the mechanisms of foundation models, analyzing them mathematically, and applying these mechanisms to push the limits of foundation models. Our work consists of three main components:

Controlled Experiments: We design rigorous controlled experiments to train smaller models from scratch, aiming to discover universal laws that extend beyond the current foundation models.

NeurIPS’24 | ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models (opens in new tab)

Mechanistic Interpretations: We perform mechanistic interpretations of pretrained large models, transforming them into “grey boxes” to enhance our understanding of their inner workings.

Applications: We use the discovered laws and interpretations to develop novel training paradigms for foundation models. Currently, our applications include interdisciplinary collaborative projects in

In addition to my primary research, I have a strong interest in system-level work that enables LLM training on less powerful GPUs:

Full parameter finetuning 8B models on RTX3090 and 70B models on 4 A100s (opens in new tab)

This complementary work aims to make LLMs accessible to more researchers and practitioners.