{"id":957738,"date":"2023-08-02T14:01:03","date_gmt":"2023-08-02T21:01:03","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=957738"},"modified":"2023-08-02T14:01:08","modified_gmt":"2023-08-02T21:01:08","slug":"a-b-interactions-a-call-to-relax","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/a-b-interactions-a-call-to-relax\/","title":{"rendered":"A\/B Interactions: A Call to Relax"},"content":{"rendered":"
If you\u2019re a regular reader of the Experimentation Platform blog, you know that we\u2019re always warning our customers to be vigilant when running A\/B tests. We warn them about the pitfalls of even tiny SRMs (sample ratio mismatches), small bits of lossiness in data joins, and other similar issues that can invalidate their A\/B tests [2<\/a>, 3<\/a>, 4<\/a>]. But today, we\u2019re going to switch gears and tell you to relax a little. We\u2019re going to show you why A\/B interactions \u2013 the dreaded scenario where two or more tests interfere with each other \u2013 are not as common a problem as you might think. Don\u2019t get us wrong, we\u2019re not saying that you can completely let down your guard and ignore A\/B interactions altogether. We\u2019re just saying that they\u2019re rare enough that you can usually run your tests without worrying about them.<\/p>\n\n\n But we\u2019re getting ahead of ourselves. What are A\/B interactions? In an A\/B test, users are randomly separated into control and treatment groups, and after being exposed to different product experiences, metrics are compared for the two groups [1]. At Microsoft\u2019s Experimentation Platform (ExP), we have hundreds of A\/B tests running every day. In an ideal world, every A\/B test would get its own separate set of users. However, splitting users across so many A\/B tests would dramatically decrease the statistical power of each test. Instead, we typically allow each user to be in multiple A\/B tests simultaneously. <\/p>\n\n\n\n For example, a ranker might have one A\/B test that changes the order of web results, and another A\/B test that changes the UX. Both A\/B tests can run at the same time, with users assigned independently to the control or treatment of each A\/B test, in four equally likely combinations:<\/p>\n\n\n\nA\/B Interactions<\/h2>\n\n\n\n
A case where concurrent A\/B tests are safe<\/h4>\n\n\n\n