Experimentation Platform (ExP)

Empower every AI builder at Microsoft to accelerate innovation through trusted, low-friction experimentation.

Articles

Universal insights despite data boundaries: how Microsoft’s Experimentation adapted to EU data protection

January 22, 2025

By Ada Wang and Hao Ai, Microsoft Experimentation Platform For years, Microsoft’s experimentation platform (ExP) has been the backbone of running A/B experiments at a global scale, analyzing results and enabling data-driven decisions for Microsoft products used worldwide. ExP started at…

Articles

External Validity of Online Experiments: Can We Predict the Future?

November 20, 2024

“It is difficult to make predictions, especially about the future” – Yogi Berra (perhaps apocryphal) How well can experiments be used to predict the future? At Microsoft’s Experimentation Platform (ExP), we pride ourselves on ensuring the trustworthiness of our experiments.…

Articles

Experimentation in Generative AI: C++ Team’s Practices for Continuous Improvement

November 12, 2024

By Sinem Akinci (opens in new tab), Microsoft Developer Division and Cindy Chiu, Microsoft Experimentation Platform Generative AI (opens in new tab) [1] leverages deep learning models to identify underlying patterns and generate original content, such as text, images, and…

Diagram illustrating an A/B test splitting traffic between two backends.

Articles

A/B Testing Infrastructure Changes at Microsoft ExP

January 29, 2024

The Experimentation Platform at Microsoft (ExP) has evolved over the past sixteen-plus years and now runs thousands of online A/B tests across most major Microsoft products every month. Throughout this time, we have seen impactful A/B tests on a huge…

Articles

How to Evaluate LLMs: A Complete Metric Framework

September 27, 2023

Over the past year, excitement around Large Language Models (LLMs) skyrocketed. With ChatGPT and BingChat, we saw LLMs approach human-level performance in everything from performance on standardized exams to generative art. However, many of these LLM-based features are new and…

Articles

A/B Interactions: A Call to Relax

August 2, 2023

If you’re a regular reader of the Experimentation Platform blog, you know that we’re always warning our customers to be vigilant when running A/B tests. We warn them about the pitfalls of even tiny SRMs (sample ratio mismatches), small bits…

CUPED adjusts metrics by the predicted value from a regression of Y on X. The treatment effect estimate has lower standard error. Estimated confidence intervals are narrower as a consequence, and power of tests are increased.

Articles

Deep Dive Into Variance Reduction

November 15, 2022

Variance Reduction (VR) is a popular topic that is frequently discussed in the context of A/B testing. However, it requires a deeper understanding to maximize its value in an A/B test.  In this blog post, we will answer questions including:…

Articles

For Event-based A/B tests: why they are special

September 26, 2022

An “event-based” A/B test is a method used to test two or more variables during a limited duration. We can use what we learn to increase user engagement, satisfaction, or retention of a product, while also applying our insights to…

Articles

STEDII Properties of a Good Metric

April 6, 2022

Good metrics enable good decisions. What makes a metric good? In this blog post we introduce the STEDII (Sensitivity, Trustworthiness, Efficiency, Debuggability, Interpretability, and Inclusivity) framework to define and evaluate the good properties of a metric and of an A/B…