About
I am a Senior Principal Researcher at Microsoft Research Redmond, where I lead the AI for Industry team in Research for Industries.
My research interests are broadly in large-scale applications of AI; AI’s impacts on people and society; and causal machine learning algorithms and generative AI systems. I am working to broaden the use of causal methods for decision-making across many application domains; and improving current applications of correlational machine learning through causal insights. My work uses machine learning methods to scale up conventional causal inference techniques to handle larger-scale and higher-dimensional datasets; adapt causal inference methods to new settings; and improve the robustness and bias of prediction and classification algorithms using causal or causal-inspired approaches.
On AI’s implications for society, my work promotes positive applications of AI and strives to mitigate its negative implications. My projects include work at the intersection of security and machine learning, studying new attacks and defenses on security-critical AI-driven systems in an end-to-end setting; questions of data biases and their implications; and infrastructure and methods for developing and maintaining privacy-preserving AI-driven systems; misinformation; and other topics.
I have a strong interest in computational social science questions and social media analyses, especially that require causal understanding of phenomenon in health, mental health; issues of data bias; and understanding how new technologies affect our awareness of the world and enable new kinds of information discovery and retrieval.
My past research has included the reliability, architecture, and operations of distributed systems, including some of the first work to apply machine learning methods to challenges of fault detection and diagnosis in large-scale systems; monitoring and optimization of web applications; and various information retrieval-related tasks, such as entity-linking and using social context to inform document ranking. I received my Ph.D. and my M.S. from Stanford University, and my B.S. in Electrical Engineering and Computer Science from U.C. Berkeley.
Highlights
AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM
The Artificial Intelligence Controller Interface (AICI) makes it easy to build and experiment with new strategies to improve LLM generations through AI Controllers that are tightly integrated with the LLM inference engine. Controllers are flexible programs capable of implementing constrained decoding, dynamic editing of prompts and generated text, and coordinating execution across multiple, parallel generations. Controllers incorporate custom logic during the token-by-token decoding and maintain state during an LLM request, and can support diverse strategies, from programmatic or query-based decoding to multi-agent conversations.
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
The causal capabilities of large language models (LLMs) is a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We further our understanding of LLMs and their causal implications, considering the distinctions between different types of causal reasoning tasks, as well as the entangled threats of construct and measurement validity.
DoWhy evolves to independent PyWhy model to help causal inference grow
Identifying causal effects is an integral part of scientific inquiry. It helps us understand everything from educational outcomes to the effects of social policies to risk factors for diseases. Questions of cause-and-effect are also critical for the design and data-driven…
Foundations of causal inference and its impacts on machine learning
Many key data science tasks are about decision-making. They require understanding the causes of an event and how to take action to improve future outcomes. Machine learning (ML) models rely on correlational patterns to predict the answer to a question…
DoWhy: Causal Reasoning for Designing and Evaluating Interventions
Today's computing systems can be thought of as interventions in people's work and daily lives. But what are the outcomes of these interventions, and how can we tune these systems for desired outcomes? In this project we are building methods…
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries
Social data in digital form, including user-generated content, expressed or implicit relations between people, and behavioral traces, are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many,…