{"id":707875,"date":"2020-12-01T10:06:53","date_gmt":"2020-12-01T18:06:53","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=707875"},"modified":"2022-05-19T11:58:08","modified_gmt":"2022-05-19T18:58:08","slug":"adversarial-machine-learning-and-instrumental-variables-for-flexible-causal-modeling","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/adversarial-machine-learning-and-instrumental-variables-for-flexible-causal-modeling\/","title":{"rendered":"Adversarial machine learning and instrumental variables for flexible causal modeling"},"content":{"rendered":"\n
\"A<\/figure>\n\n\n\n

We are going through a new shift in machine learning (ML), where ML models are increasingly being used to automate decision-making in a multitude of domains: what personalized treatment should be administered to a patient, what discount should be offered to an online customer, and other important decisions that can greatly impact people\u2019s lives.<\/p>\n\n\n\n

The machine learning revolution was primarily driven by problems that are distant from such decision-making scenarios. The first scenarios include predicting what an image depicts, predicting the meaning of an English text, or predicting the next frame in a video sequence. This begs the question: is the same hammer used to enable these high-accuracy predictive models equally good enough to drive the nail in automated decision-making? Enter the nascent field of causal machine learning.<\/em><\/p>\n\n\n\n

Making good decisions requires uncovering causal relationships from data. Causal ML attempts to bridge the gap between prediction and causal inference by utilizing all the recent methodological, technological, and theoretical advances in ML for predictive problems. The field attempts to redirect these advances to address causal problems, many times even using ML out of the box, reducing the causal problem into a sequence of prediction problems that are carefully combined to uncover the causal relationship of interest. You can learn more about this topic and related work across Microsoft Research at the Causality and Machine Learning page<\/a>.<\/p>\n\n\n\n

Our work in \u201cMinimax Estimation of Conditional Moment Models (opens in new tab)<\/span><\/a>,\u201d accepted at the 34th<\/sup> Conference on Neural Information Processing Systems (NeurIPS2020) (opens in new tab)<\/span><\/a>, and its earlier incarnation in \u201cAdversarial Generalized Method of Moments (opens in new tab)<\/span><\/a>\u201d is part of this rapidly growing literature in causal ML. In these two works, with fellow Microsoft Research New England researchers Greg Lewis (opens in new tab)<\/span><\/a> and Lester Mackey (opens in new tab)<\/span><\/a> along with MIT student Nishanth Dikkala (opens in new tab)<\/span><\/a>, we propose a novel way of estimating flexible causal models with machine learning from non-experimental data, blending ideas from instrumental variable (IV) estimation from econometrics and generative adversarial networks from machine learning. We\u2019ve made our IV estimation methods open source at our GitHub page (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n

On the technical level, our work transforms the IV problem into a min-max loss minimization problem\u2014addressable by ML techniques\u2014and develops novel statistical learning theory building on recent work in statistical ML. Before we get into our advances with causal inference and adversarial ML, let\u2019s take a look at why causal inference can lead to ML models that are better suited for decision-making.<\/p>\n\n\n\n\n\t

\n\t\t\n\n\t\t

\n\t\tMicrosoft Research Blog<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"Microsoft\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

Microsoft Research Forum Episode 3: Globally inclusive and equitable AI, new use cases for AI, and more<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

In the latest episode of Microsoft Research Forum, researchers explored the importance of globally inclusive and equitable AI, shared updates on AutoGen and MatterGen, presented novel use cases for AI, including industrial applications and the potential of multimodal models to improve assistive technologies.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tRead more\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n\n

Decision-making is about predicting counterfactuals<\/h2>\n\n\n\n

At its core, the problem with using opaque ML models to enable automated decisions is the discrepancy between correlation and causation. Most ML models achieve high accuracy by taking advantage, in an automated manner, of all correlations that exist in the data to predict what label or outcome is most associated with a given set of features (or variables). Making an optimal decision roughly corresponds to making an intervention and changing one of these features. To make the optimal decision, we need to understand what the outcome would be had we intervened and changed one of the features of a sample. Predicting such counterfactual outcomes<\/em> requires uncovering the causal relationship between an intervening feature and an outcome, that is understanding the causal effect of that feature on the outcome. Building models of these causal relationships from data is at the core of the overarching field, causal inference.<\/p>\n\n\n\n

The problem is that we never observe this counterfactual quantity in the data. Using predictive ML models to essentially \u201cimpute\u201d these unobserved quantities runs the risk of relying on correlations in the data that will be broken by interventions and lead to erroneous answers to such \u201cwhat-if\u201d questions. To make optimal decisions, we need to build causal models that predict counterfactual quantities, in other words what would have happened to the outcome had the decision maker intervened and changed the treatment (also known as action, target feature, driver) to a different, hypothetical value from the one observed in the data.<\/p>\n\n\n\n

The big hurdle of causal inference: unobserved confounding<\/h2>\n\n\n\n

The main hurdle in uncovering causal relationships from non-experimental (observational) data, that is data not coming from an A\/B test or a randomized control trial, is unobserved confounding.<\/p>\n\n\n\n

Let\u2019s illustrate this problem with an example. Suppose that we want to set an optimal price for a hotel room, and we have historical data for weekly price and demand (and potentially other observed features associated with each week). It is highly likely that price would increase in historical data in anticipation of high demand due to signals that are only observed by the price setter. For instance, there could be an event happening in town that is not marked in our data, and due to that event, prices for a room surged in that week. At the same time, demand also surged. If we were to train a ML model on such data, we could register this weird correlation that demand increases when price increases and predict that very high prices maximize revenue. This is obviously an erroneous conclusion and portrays the pitfalls of making decisions based on predictive models.<\/p>\n\n\n\n

\"Figure
Figure 1: A predictive model uses the spurious correlation created by the events in town example in the paragraph above. The predictive model is not causal since it learns that high prices lead to high demand and could lead to erroneous decisions if treated as a causal model.<\/figcaption><\/figure>\n\n\n\n

Enter econometrics and instrumental variables.<\/strong> How can we remove biases due to unobserved confounding and get at the causal relationship? Removing such observational biases has been the staple of econometric analysis and more broadly of the field of causal inference. One very popular method used in econometrics is that of instrumental variables<\/em>, which dates back to very early works in empirical economics in the 1920s (see list of resources at the end of this section).<\/p>\n\n\n\n

\"Figure
Figure 2: A causal diagram depicting the assumptions that a variable Z needs to satisfy to constitute an instrumental variable. Z is the instrument, T is the treatment\/action, Y is the outcome of interest, and U represents unobserved confounding variables that correlate both with the treatment and with the outcome. The important assumption is that there is no arrow connecting the instrument Z directly to the outcome Y, but all paths from Z to Y go through T.<\/figcaption><\/figure>\n\n\n\n

An instrument <\/em>is an observed variable in the data that affects which action was taken but has no direct causal effect on the outcome of interest. Using another pricing example, suppose that we wanted to identify how demand for coffee in the US is affected by variations in the price. Then we observe that weather in Brazil can affect the production cost of coffee, which can subsequently affect the price of coffee in the US. However, the weather in Brazil has no direct effect on the demand for coffee in the US. Thus, the weather in Brazil can be used as an instrument to uncover the causal relationship between price and demand in the US. Though the process of identifying a valid instrument is a delicate task, as it is context-specific and requires domain knowledge, using instrumental variables is a ubiquitous approach in causal inference.<\/p>\n\n\n\n

\"Figure
Figure 3: A pictorial depiction of the causal model fitted via the instrumental variable method. Unlike the predictive model, the IV-based model first collapses the data points by replacing them with their within-group averages, where the group of each point is dictated by the value of the instrument. Then the causal model is the predictive model that is fitted solely using these within-group average data points.<\/figcaption><\/figure>\n\n\n\n\n\n