About
augmented imodels (opens in new tab) – use LLMs to build a transparent model
imodels (opens in new tab) – build interpretable models in the style of scikit-learn
explanation penalization (opens in new tab) – regularize explanations to align models with prior knowledge
adaptive wavelet distillation (opens in new tab) – replace neural nets with simple, performant wavelet models
๐ LLM steering. Interpretability tools can provide ways to better guide and use LLMs
tree prompting (opens in new tab) – improve black-box few-shot text classification with decision trees
attention steering (opens in new tab) – guide LLMs by emphasizing specific input spans
interpretable autoprompting (opens in new tab) – automatically find fluent natural-language prompts
๐ง Neuroscience. Since joining MSR, I have been focused on leveraging LLMs to understand how the human brain represents language (using fMRI in collaboration with the Huth lab (opens in new tab) at UT Austin).
explanation-mediated validation (opens in new tab) – build and test fMRI explanations using LLM-generated stimuli
qa embeddings (opens in new tab) – build interpretable fMRI encoding models by asking yes/no questions to LLMs
summarize & score explanations (opens in new tab) – generate natural-language explanations of fMRI encoding models
clinical self-verification (opens in new tab) – self-verification improves performance and interpretability of clinical information extraction
clinical rule vetting (opens in new tab) – stress testing a clinical decision instrument performance for intra-abdominal injury
My PhD at UC Berkeley (advised by Bin Yu (opens in new tab)) focused on working with scientists and doctors to develop interpretations for scientific domains.
Internships / collaborations
If you want to chat about research (or are interested in interning at MSR), feel free to reach out over email!
Previously, I’ve been lucky to help mentor some wonderful students:
Recent publications
Explaining Patterns in Data with Language Models via Interpretable Autoprompting
Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trainedโฆ