{"id":1100547,"date":"2024-11-12T16:13:26","date_gmt":"2024-11-13T00:13:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=1100547"},"modified":"2024-11-15T15:44:55","modified_gmt":"2024-11-15T23:44:55","slug":"experimentation-in-genai-c-teams-practices-for-continuous-improvement","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/experimentation-in-genai-c-teams-practices-for-continuous-improvement\/","title":{"rendered":"Experimentation in Generative AI: C++ Team\u2019s Practices for Continuous Improvement"},"content":{"rendered":"\n
By Sinem Akinci<\/a>, Microsoft Developer Division and Cindy Chiu<\/a>, Microsoft Experimentation Platform<\/p>\n\n\n\n Generative AI<\/a> [1<\/a>] leverages deep learning models to identify underlying patterns and generate original content, such as text, images, and videos. This technology has been applied to various industries, including customer service, marketing, and software development. A popular example is GitHub Copilot, which generates code based on open-source data.<\/p>\n\n\n\n The generative AI space is undergoing rapid transformation with new updates and changes emerging daily. Products leveraging generative AI must constantly make decisions on the right set of parameters, models, and prompts to find the best combination. Experimentation plays a crucial role in navigating this dynamic landscape, which enables data-driven decision-making and refining generative AI features. As a case study, let\u2019s now explore how the Microsoft C++ team applies this technology in practice, using experimentation to develop and refine GitHub Copilot features.<\/p>\n\n\n\n In this blog post, we will first provide a general overview of best practices for experimenting and evaluating generative AI features. Then we will highlight some of these practices that the C++ team uses to develop GitHub Copilot features with experimentation. We will explain how these best practices benefit the product. Lastly, we will conclude with an example of a new feature we shipped leveraging these practices.\u00a0<\/p>\n\n\n\n Qualitative methods<\/a> [2<\/a>] offer valuable insights into the user experience through various approaches such as usability studies, surveys, focus groups, interviews, and diary studies. These methods help uncover the nuances that are hard for quantitative methods to capture, providing an initial understanding of user interactions. However, since qualitative methods often come from smaller sample sizes, they may not provide a complete picture. Instead, these methods enable developers to identify gaps between features and user needs, particularly for generative AI features that involve both model content and user interface.\u00a0\u00a0<\/p>\n\n\n\n Quantitative methods for evaluating generative AI features can be divided into two categories: offline evaluation and online evaluation.\u00a0<\/p>\n\n\n\n Offline evaluation<\/strong>, which includes techniques like hyperparameter tuning and grid search, assesses model accuracy and feature performance before deployment. This approach works particularly well when there are known ground truth values and clean datasets. By using various datasets and predefined metrics, developers can compare models and benchmarks cost-effectively without exposing them to actual users. <\/p>\n\n\n\n Online evaluation<\/strong>, such as A\/B testing, involves exposing the feature to actual customers. It verifies the results observed during offline testing in a real-world context, capturing true user interactions and ensuring the feature performs effectively in production. <\/p>\n\n\n\nMethods for making data-driven decisions for generative AI products\u00a0<\/h2>\n\n\n\n
What are qualitative methods? <\/h3>\n\n\n\n
What are quantitative methods? <\/h3>\n\n\n\n
Incorporating all methods into your product lifecycle <\/h3>\n\n\n\n