{"id":868917,"date":"2022-11-15T07:22:30","date_gmt":"2022-11-15T15:22:30","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=868917"},"modified":"2022-11-15T12:33:24","modified_gmt":"2022-11-15T20:33:24","slug":"deep-dive-into-variance-reduction","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/deep-dive-into-variance-reduction\/","title":{"rendered":"Deep Dive Into Variance Reduction"},"content":{"rendered":"\n

Variance Reduction (VR) is a popular topic that is frequently discussed in the context of A\/B testing. However, it requires a deeper understanding to maximize its value in an A\/B test.\u202f In this blog post, we will answer questions including: What does the \u201cvariance\u201d in VR refer to? \u202fWill VR make A\/B tests more trustworthy?\u202f How will VR impact the ability to detect true change in A\/B metrics? <\/p>\n\n\n\n

This blog post provides an overview of ExP\u2019s implementation of VR, a technique called CUPED (Controlled experiment Using Pre-Experiment Data). Other authors have contributed excellent explainers of CUPED\u2019s performance and its ubiquity as an industry-standard variance reduction technique [1][2]. We have covered in previous blog posts how ExP uses CUPED in the experiment lifecycle [3]. <\/p>\n\n\n\n

In this post, we share the foundations of VR in statistical theory and how it amplifies the power of an A\/B testing program without increasing the likelihood of making a wrong decision. [a]<\/a>[4]<\/p>\n\n\n\n


\n\n\n\n

[a]<\/a> Many of the elements covered quickly in this blog are covered in excellent detail in Causal Inference and Its Applications in Online Industry [4].<\/p>\n\n\n\n

Variance is a Statistical Property of Estimators<\/h3>\n\n\n\n

To understand where variance reduction fits in, let\u2019s start with a more fundamental question: What\u2019s our ideal case for analyzing an A\/B test? <\/em>We want to estimate the difference in two potential outcomes for a user: the outcome in a world where the treatment was applied, and the outcome in a world where the treatment was not applied \u2013 the counterfactual. <\/p>\n\n\n\n

The fundamental challenge of causal inference is that we cannot observe those two worlds simultaneously, and so we must come up with a process for estimating the counterfactual difference. In A\/B testing, that process relies on applying treatments to different users. Different users are never perfect substitutes for one another because their outcomes are not only functions of the treatment assignment, but also impacted by many other factors that influence user behavior.<\/p>\n\n\n\n

Causal inference is a set of scientific methods to estimate the counterfactual difference in potential outcomes between our two imagined worlds. Any process of estimating this counterfactual difference introduces uncertainty. <\/p>\n\n\n\n

Statistical inference is the process of proposing and refining estimators of an average counterfactual difference to improve the estimators\u2019 core statistical properties: <\/p>\n\n\n\n