Tutorial on Causal Inference and Counterfactual Reasoning

Emre Kiciman; Amit Sharma

Tutorial on Causal Inference and Counterfactual Reasoning

ACM KDD International Conference on Knowledge Discovery and Data Mining | August 2018

Download BibTex

As computing systems are more frequently and more actively intervening to improve people’s work and daily lives, it is critical to correctly predict and understand the causal effects of these interventions. Conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal analysis. This tutorial will introduce participants to concepts in causal inference and counterfactual reasoning, drawing from a broad literature on the topic from statistics, social sciences and machine learning.

We first motivate the use of causal inference through examples in domains such as recommender systems, social media datasets, health, education and governance. To tackle such questions, we will introduce the key ingredient that causal analysis depends on—counterfactual reasoning—and describe the two most popular frameworks based on Bayesian graphical models and potential outcomes. Based on this, we will cover a range of methods suitable for doing causal inference with large-scale online data, including randomized experiments, observational methods like matching and stratification, and natural experiment-based methods such as instrumental variables and regression discontinuity. We will also focus on best practices for evaluation and validation of causal inference techniques, drawing from our own experiences.

We show application of these techniques through Jupyter notebooks, demonstrating how core concepts translate to empirical work. Throughout, we emphasise considerations of working with large-scale data from online systems, such as logs of user interactions or social data. The goal of this tutorial is to help you understand the basics of causal inference, be able to appropriately apply the most common causal inference methods, and be able to recognize situations where more complex methods are required.

Sections

Introduction (opens in new tab): Patterns and predictions are not enough
Methods (opens in new tab): Conditioning-based methods and natural experiments
Considerations (opens in new tab): Special considerations with large-scale and network data
Broader Landscape (opens in new tab): Heterogeneous treatment effects, machine learning and causal discovery
References (opens in new tab): Further reading