Articles
“It is difficult to make predictions, especially about the future” – Yogi Berra (perhaps apocryphal) How well can experiments be used to predict the future? At Microsoft’s Experimentation Platform (ExP), we pride ourselves on ensuring the trustworthiness of our experiments.…
By Sinem Akinci, Microsoft Developer Division and Cindy Chiu, Microsoft Experimentation Platform Generative AI [1] leverages deep learning models to identify underlying patterns and generate original content, such as text, images, and videos. This technology has been applied to various…
The Experimentation Platform at Microsoft (ExP) has evolved over the past sixteen-plus years and now runs thousands of online A/B tests across most major Microsoft products every month. Throughout this time, we have seen impactful A/B tests on a huge…
Over the past year, excitement around Large Language Models (LLMs) skyrocketed. With ChatGPT and BingChat, we saw LLMs approach human-level performance in everything from performance on standardized exams to generative art. However, many of these LLM-based features are new and…
A/B Interactions: A Call to Relax
If you’re a regular reader of the Experimentation Platform blog, you know that we’re always warning our customers to be vigilant when running A/B tests. We warn them about the pitfalls of even tiny SRMs (sample ratio mismatches), small bits…
Deep Dive Into Variance Reduction
Variance Reduction (VR) is a popular topic that is frequently discussed in the context of A/B testing. However, it requires a deeper understanding to maximize its value in an A/B test. In this blog post, we will answer questions including:…
An “event-based” A/B test is a method used to test two or more variables during a limited duration. We can use what we learn to increase user engagement, satisfaction, or retention of a product, while also applying our insights to…
STEDII Properties of a Good Metric
Good metrics enable good decisions. What makes a metric good? In this blog post we introduce the STEDII (Sensitivity, Trustworthiness, Efficiency, Debuggability, Interpretability, and Inclusivity) framework to define and evaluate the good properties of a metric and of an A/B…
Imagine that you have developed a new hypothesis for how to improve the user experience of your product. Now you need to test it. There are many ways that you could approach this. For instance, running an A/B test, engaging…