{"id":832888,"date":"2022-04-06T09:30:31","date_gmt":"2022-04-06T16:30:31","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=832888"},"modified":"2022-04-06T09:30:31","modified_gmt":"2022-04-06T16:30:31","slug":"stedii-properties-of-a-good-metric","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/stedii-properties-of-a-good-metric\/","title":{"rendered":"STEDII Properties of a Good Metric"},"content":{"rendered":"
When a product adopts an experimentation-driven culture, software development tends to shift from being a top-down decision to more of a democratized approach. Instead of arguing about what should be built, product leaders define goals for metrics to improve the product, and they empower their teams to invest in changes that will ultimately achieve those goals. This allows the organization\u2019s culture to innovate faster by testing multiple ideas for improvement, fail fast, and iterate.<\/p>\n
One of the best ways to experiment with a software product is to run A\/B tests. For successful A\/B tests, it is very important to have the right metrics. But what makes a good metric? Is the company\u2019s stock price a good metric for a product team? Probably not. It is not sensitive to small changes in the product, and we cannot observe the counterfactual \u2013 that is, the stock price in the universe where the treatment is not present. Perhaps the company could conduct an extensive user survey for each change, and then measure the degree of satisfaction that their users have for the change. However, such a survey for each product change would annoy users, it would be very costly to scale, and it would not be reflective of the overall user population because many users won\u2019t respond to the survey. These examples demonstrate just how challenging it is to define a good A\/B metric.<\/p>\n
So how do we define a good metric? After running hundreds of thousands of A\/B tests at Microsoft, we have identified six key properties of a good A\/B metric:<\/p>\n
In this blog post, we will examine each of these properties more closely to understand what makes a good metric, and we will provide checks to test a metric against each property. We would like to emphasize, however, that these are general<\/em> properties of a good experimentation metric. They are necessary properties for most experiment metrics, but they may not be sufficient for use in every single case. In later blog posts, for instance, we will discuss Overall Evaluation Criteria (OEC) metrics that should have additional properties, such as being a proxy for the overall product, user, and business health.<\/p>\n