Causal Transfer Random Forest: Leveraging Observational and Randomization Studies
- Shuxi Zeng ,
- Emre Kiciman ,
- Denis Charles ,
- Joel Pfeiffer ,
- Murat Ali Bayir
NeurIPS 2019 Workshop, “Do the right thing”: machine learning and causal inference for improved decision making
It is often critical for prediction models to be robust to distributional shifts. Online advertisement platforms, for example, evaluate systems and policy changes using models that predict whether users will click on shown advertisements. Click prediction models built using conventional machine learning methods, however, become unreliable when the new systems or policy significantly shifts the feature distributions away from the available training data—usually large-scale observational data from the online system. In this paper, we describe a causal transfer random forest (CTRF) which combines existing training data with a small amount of data from randomized experiments to make robust predictions under distributional shifts. We learn the CTRF tree structure from randomized data—which breaks spurious correlations between input features and prediction targets—and then calibrate each node with both existing large-scale training data and randomized data. We evaluate the proposed method using data from radical exploration flights in an online ad platform and find that the CTRF outperforms other approaches.