{"id":1141125,"date":"2025-06-23T09:40:10","date_gmt":"2025-06-23T16:40:10","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-story&#038;p=1141125"},"modified":"2025-12-15T14:58:19","modified_gmt":"2025-12-15T22:58:19","slug":"ai-testing-and-evaluation-learnings-from-science-and-industry","status":"publish","type":"msr-story","link":"https:\/\/www.microsoft.com\/en-us\/research\/story\/ai-testing-and-evaluation-learnings-from-science-and-industry\/","title":{"rendered":"AI Testing and Evaluation: Learnings from Science and Industry"},"content":{"rendered":"\n<div class=\"wp-block-cover is-light has-parallax is-style-default\" style=\"min-height:900px;aspect-ratio:unset;\"><div role=\"img\" aria-label=\"RAI hero test\" class=\"wp-block-cover__image-background wp-image-1141306 has-parallax\" style=\"background-position:50% 50%;background-image:url(https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/AI-TE-WebHero_2000x1333_new.jpg)\"><\/div><span aria-hidden=\"true\" class=\"wp-block-cover__background has-black-background-color has-background-dim-0 has-background-dim\"><\/span><div class=\"wp-block-cover__inner-container is-layout-constrained wp-container-core-cover-is-layout-2cb6a229 wp-block-cover-is-layout-constrained\">\n<div class=\"wp-block-group is-content-justification-left is-layout-constrained wp-container-core-group-is-layout-408cd6c9 wp-block-group-is-layout-constrained\">\n<div style=\"height:200px\" aria-hidden=\"true\" class=\"wp-block-spacer d-none d-sm-block\"><\/div>\n\n\n\n<h1 class=\"wp-block-heading is-style-display has-black-color has-text-color has-link-color wp-elements-2a6e20483e832e925c2140770aaef038\" id=\"ai-testing-and-evaluation-learnings-from-science-and-industry\">AI Testing and Evaluation: Learnings from Science and Industry<\/h1>\n\n\n\n<div style=\"height:200px\" aria-hidden=\"true\" class=\"wp-block-spacer d-none d-sm-block\"><\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n\n\n<article class=\"wp-block-group alignfull mt-0 is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-style-dark-mode p-4 z-20 container theme-dark is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:22%\"><\/div>\n\n\n\n<div class=\"wp-block-column headings-large is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:56%\">\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer is-style-default d-none d-md-block\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading is-style-default-h3\" id=\"discover-how-microsoft-is-learning-from-other-domains-to-advance-evaluation-and-testing-as-a-pillar-of-ai-governance\">Discover how Microsoft is learning from other domains to advance evaluation and testing as a pillar of AI governance.<\/h2>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Generative AI presents a unique challenge and opportunity to reexamine governance practices for the responsible development, deployment, and use of AI. To advance thinking in this space, Microsoft has tapped into the experience and knowledge of experts across domains\u2014from genome editing to cybersecurity\u2014to investigate the role of testing and evaluation as a governance tool. <em>AI Testing and Evaluation: Learnings from Science and Industry, <\/em>hosted by Microsoft Research\u2019s <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/kasull\/\">Kathleen Sullivan<\/a>, explores what the technology industry and policymakers can learn from these fields and how that might help shape the course of AI development.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"episodes\">Episodes<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"episode-0-series-introduction\">Introducing \u2018AI Testing and Evaluation: Learnings from Science and Industry\u2019<\/h3>\n\n\n\n<p>Amanda Craig Deckard | June 23, 2025<\/p>\n\n\n\n<p>In the introductory episode of this new series, host Kathleen Sullivan and Senior Director Amanda Craig Deckard explore Microsoft\u2019s efforts to draw on the experience of other domains to help advance the role of AI testing and evaluation as a governance tool.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-learnings-from-science-and-industry\/\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-1024x576.jpg\" alt=\"Illustrated headshots of Amanda Craig Deckard & Kathleen Sullivan.\" class=\"wp-image-1141309\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_Hero_Feature_1400x788.jpg 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-learnings-from-science-and-industry\" data-bi-cN=\"AI Testing and Evaluation: Learnings from Science and Industry\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>AI Testing and Evaluation: Learnings from Science and Industry<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">BLog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/learning-from-other-domains-to-advance-ai-evaluation-and-testing\/\" data-bi-cN=\"Learning from other domains to advance AI evaluation and testing\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Learning from other domains to advance AI evaluation and testing<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading has-text-align-left\" id=\"guest-1\">Guest<\/h4>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1401\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_1400x1400-1-1.jpg\" alt=\"illustration of Amanda Craig Deckard\" class=\"wp-image-1142282\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_1400x1400-1-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_1400x1400-1-1-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_1400x1400-1-1-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_1400x1400-1-1-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_1400x1400-1-1-768x769.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_1400x1400-1-1-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP0-AI-TE_1400x1400-1-1-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p><strong><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/amcraig\/\">Amanda Craig Deckard<\/a><\/strong><br>Amanda Craig Deckard is senior director of public policy in Microsoft\u2019s Office of Responsible AI, where she leads efforts to strengthen AI governance as a foundation for trust and innovation.<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h3 class=\"wp-block-heading\" id=\"episode-3-empowering-patients-and-healthcare-consumers-in-the-age-of-generative-ai\">Episode 1 | AI Testing and Evaluation: Learnings from genome editing&nbsp;<\/h3>\n\n\n\n<p>Alta Charo, Daniel Kluttz | June 30, 2025<\/p>\n\n\n\n<p>Bioethics and law expert Alta Charo explores the value of regulating&nbsp;technologies at the application level and the role of coordinated oversight in genome editing, while Microsoft GM Daniel Kluttz reflects on Charo\u2019s points, drawing parallels to AI governance.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:60%\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1.jpg\" alt=\"Outline illustrations of Alta Charo, Kathleen Sullivan, and Daniel Kluttz\" class=\"wp-image-1143279\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_Hero_Feature_1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-learnings-from-genome-editing\/\" data-bi-cN=\"AI Testing and Evaluation: Learnings from genome editing\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>AI Testing and Evaluation: Learnings from genome editing<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"guests\">Guests<\/h4>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1400\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-1.jpg\" alt=\"Alta Charo\" class=\"wp-image-1143276\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-1-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-1-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-1-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-1-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-1-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-1-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p><strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/law.wisc.edu\/profiles\/racharo@wisc.edu\" target=\"_blank\" rel=\"noopener noreferrer\">Alta Charo<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong><br>Alta Charo, the Warren P. Knowles Professor Emerita of Law and Bioethics, is a biotechnology policy and ethics consultant who has been at the forefront of the field for decades.<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1400\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-3.jpg\" alt=\"Daniel Kluttz\" class=\"wp-image-1143277\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-3.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-3-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-3-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-3-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-3-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-3-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP1-AI-TE_1400x1400-3-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.linkedin.com\/in\/daniel-kluttz\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Daniel Kluttz<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br>Daniel Kluttz is a partner general manager in Microsoft&#8217;s Office of Responsible AI, where he leads the group\u2019s Sensitive Uses and Emerging Technologies program. &nbsp;<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h3 class=\"wp-block-heading\" id=\"episode-2-ai-testing-and-evaluation-learnings-from-pharmaceuticals-and-medical-devices\">Episode 2 | AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices<\/h3>\n\n\n\n<p>Daniel Carpenter, Timo Minssen, Chad Atalla | July 7, 2025<\/p>\n\n\n\n<p>Professors Daniel Carpenter and Timo Minssen explore evolving pharma and medical device regulation, including the role of clinical trials, while Microsoft applied scientist Chad Atalla shares where AI governance stakeholders might find inspiration in the fields.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-learnings-from-pharmaceuticals-and-medical-devices\/\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788.jpg\" alt=\"Illustrated headshots of Daniel Carpenter, Timo Minssen, Chad Atalla, and Kathleen Sullivan for the Microsoft Research Podcast\" class=\"wp-image-1143327\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_Hero_Feature_1400x788-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-learnings-from-pharmaceuticals-and-medical-devices\/\" data-bi-cN=\"AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"guests-1\">Guests<\/h4>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1400\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-1.jpg\" alt=\"Daniel Carpenter\" class=\"wp-image-1144011\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-1-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-1-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-1-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-1-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-1-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-1-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/dcarpenter.scholar.harvard.edu\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Daniel Carpenter<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br>Daniel Carpenter is the Allie S. Freed Professor of Government and chair of the department of government at Harvard. His research spans social and political science, including pharmaceutical regulation.<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1400\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-2.jpg\" alt=\"Timo Minssen\" class=\"wp-image-1144010\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-2.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-2-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-2-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-2-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-2-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-2-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-2-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/researchprofiles.ku.dk\/en\/persons\/timo-minssen\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Timo Minssen<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br>Timo Minssen is a law professor at the University of Copenhagen, where he leads the Center for Advanced Studies in Bioscience Innovation Law. He specializes in legal aspects of biomedical innovation.<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1400\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-3.jpg\" alt=\"Chad Atalla\" class=\"wp-image-1144009\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-3.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-3-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-3-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-3-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-3-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-3-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP2-AI-TE_1400x1400-3-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chatalla\/\"><strong>Chad Atalla<\/strong><\/a><br>Chad Atalla is a senior applied scientist in Microsoft Research New York City&#8217;s Sociotechnical Alignment Center, where they contribute to responsible AI research and practical responsible AI solutions.<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h3 class=\"wp-block-heading\" id=\"episode-3-empowering-patients-and-healthcare-consumers-in-the-age-of-generative-ai\">Episode 3 | AI Testing and Evaluation: Learnings from cybersecurity&nbsp;<\/h3>\n\n\n\n<p>Ciaran Martin, Tori Westerhoff | July 14, 2025<\/p>\n\n\n\n<p>Drawing on his previous work as\u202fthe UK\u2019s cybersecurity chief, Professor Ciaran Martin explores differentiated standards and public-private partnerships in cybersecurity, and Microsoft\u2019s Tori Westerhoff examines the insights through an AI red-teaming lens.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:60%\">\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-learnings-from-cybersecurity\/\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1.jpg\" alt=\"Illustrated images of Kathleen Sullivan, Ciaran Martin, and Tori Westerhoff for the Microsoft Research podcast\" class=\"wp-image-1144391\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP3-AI-TE_Hero_Feature_1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-learnings-from-cybersecurity\/\" data-bi-cN=\"AI Testing and Evaluation: Learnings from cybersecurity\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>AI Testing and Evaluation: Learnings from cybersecurity<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"guests\">Guests<\/h4>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><a href=\"https:\/\/www.bsg.ox.ac.uk\/people\/ciaran-martin\" target=\"_blank\" rel=\" noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1400\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-1.jpg\" alt=\"Illustration of Ciaran Martin\" class=\"wp-image-1144642\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-1-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-1-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-1-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-1-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-1-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-1-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><\/figure>\n\n\n\n<p><strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.bsg.ox.ac.uk\/people\/ciaran-martin\" target=\"_blank\" rel=\"noopener noreferrer\">Ciaran Martin<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong><br>Ciaran Martin is a professor of practice in the management of public organizations at the University of Oxford. Previously, he was the founding chief executive of the UK\u2019s National Cyber Security Centre.<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><a href=\"https:\/\/www.victoriawesterhoff.com\/\" target=\"_blank\" rel=\" noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1400\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-3.jpg\" alt=\"Illustration of Tori Westerhoff\" class=\"wp-image-1144641\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-3.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-3-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-3-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-3-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-3-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-3-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/EP3-AI-TE_1400x1400-3-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><\/figure>\n\n\n\n<p><strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.victoriawesterhoff.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Tori Westerhoff<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong><br>A principal director on the Microsoft AI Red Team, Tori Westerhoff leads AI security and safety red team operations and dangerous capability testing, directly informing company leadership.<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h3 class=\"wp-block-heading\" id=\"episode-3-empowering-patients-and-healthcare-consumers-in-the-age-of-generative-ai\">Episode 4 | AI Testing and Evaluation: Reflections<\/h3>\n\n\n\n<p>Amanda Craig Deckard&nbsp;| July 21, 2025<\/p>\n\n\n\n<p>In the series finale, Amanda Craig Deckard returns to examine what Microsoft has learned about testing as a governance tool. She also explores the roles of rigor, standardization, and interpretability in testing and what\u2019s next for Microsoft\u2019s AI governance work.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:60%\">\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-reflections\/\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788.jpg\" alt=\"Illustrated headshots of Amanda Craig Deckard and Kathleen Sullivan.\" class=\"wp-image-1145065\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_Hero_Feature_1400x788-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ai-testing-and-evaluation-reflections\/\" data-bi-cN=\"AI Testing and Evaluation: Reflections\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>AI Testing and Evaluation: Reflections<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">PUBLICATION<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-from-other-domains-to-advance-ai-evaluation-and-testing\/\" data-bi-cN=\"Learning from other domains to advance AI evaluation and testing\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Learning from other domains to advance AI evaluation and testing<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"guests\">Guests<\/h4>\n\n\n\n<figure class=\"wp-block-image alignleft size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1400\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_1400x1400-1.jpg\" alt=\"Illustrated headshot of Amanda Craig Deckard\" class=\"wp-image-1145232\" style=\"width:150px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_1400x1400-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_1400x1400-1-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_1400x1400-1-1024x1024.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_1400x1400-1-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_1400x1400-1-768x768.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_1400x1400-1-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/07\/EP4-AI-TE_1400x1400-1-360x360.jpg 360w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p><strong><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/amcraig\/\">Amanda Craig Deckard<\/a><\/strong><br>Amanda Craig Deckard is senior director of public policy in Microsoft\u2019s Office of Responsible AI, where she leads efforts to strengthen AI governance as a foundation for trust and innovation.<\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:22%\"><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-group theme-dark is-style-default container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div style=\"padding-bottom:32px; padding-top:32px\" class=\"wp-block-msr-immersive-section alignfull row wp-block-msr-immersive-section\">\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wp-block-msr-immersive-section__wrapper col-lg-11 col-xl-9 px-0 m-auto\">\n\t\t\t<p><em><strong>Series contributors<\/strong>: Neeltje Berger, Tetiana Bukhinska, David Celis Garcia, Matt Corwine, Amanda Craig Deckard, Kristina Dodge, Chris Duryee, Milan Gandhi, Ann Griffin, Alyssa Hughes, Gretchen Huizinga, Matthew McGinley, Amanda Melfi, Joe Plummer, Brenda Potts, Kathleen Sullivan, Amber Tingle, Kathleen Toohill, Craig Tuschoff, Sarah Wang, Brian Wesolowski, and Katie Zoller.<\/em><\/p>\n\n\n\n<p><em>Series launched on June 23, 2025<\/em><\/p>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n<div style=\"height:60px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<h3 class=\"wp-block-heading is-style-default h2\" id=\"lightning-talks\">Other resources<\/h3>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-top is-layout-flow wp-block-column-is-layout-flow\">\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/ai\/responsible-ai?ef_id=_k_cb05d5950e4f117c457ebda628845b7f_k_&OCID=AIDcmm1o1fzy5i_SEM__k_cb05d5950e4f117c457ebda628845b7f_k_&msclkid=cb05d5950e4f117c457ebda628845b7f\" target=\"_blank\" rel=\"noreferrer noopener\">Responsible AI at Microsoft<\/a><\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/cdn-dynmedia-1.microsoft.com\/is\/content\/microsoftcorp\/microsoft\/msc\/documents\/presentations\/CSR\/Global-Governance-Book-DIGITAL.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Global Governance:<br>Goals and Lessons for AI<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-top is-layout-flow wp-block-column-is-layout-flow\">\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/\">Microsoft Research Podcast<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/learning-from-other-domains-to-advance-ai-evaluation-and-testing\/\">Learning from other domains to advance AI evaluation and testing<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div style=\"height:60px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n<\/article>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"frequently-asked-questions\">Frequently asked questions<\/h2>\n\n\n\n\n\n<p>AI governance refers to the frameworks, policies, best practices, and tools that guide the responsible development, deployment, and use of AI. At Microsoft, this includes working to ensure the alignment of AI systems with our <a href=\"https:\/\/www.microsoft.com\/en-us\/ai\/principles-and-approach\" target=\"_blank\" rel=\"noreferrer noopener\">Responsible AI Standard<\/a>, which we continue to build on as new AI capabilities, risks, and regulatory requirements emerge. Read more about Microsoft\u2019s internal approach to AI governance in our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/cdn-dynmedia-1.microsoft.com\/is\/content\/microsoftcorp\/microsoft\/msc\/documents\/presentations\/CSR\/2025-Responsible-AI-Transparency-Report.pdf#page=1\" target=\"_blank\" rel=\"noopener noreferrer\">2025 Responsible AI Transparency Report<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n\n\n<p>AI evaluations are structured ways to test how AI models and systems perform and where they could go wrong. Because this is a rapidly evolving field, there\u2019s no single agreed way to categorize these tests. Different methods are used depending on what is being tested and when it is tested.<\/p>\n\n\n\n<p>The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.gov.uk\/government\/publications\/international-ai-safety-report-2025\/international-ai-safety-report-2025\" target=\"_blank\" rel=\"noopener noreferrer\">International AI Safety Report 2025<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u2014the world\u2019s first comprehensive synthesis of research on the capabilities and risks of advanced AI systems\u2014defines AI evaluations as \u201csystematic assessments of an AI system\u2019s performance, capabilities, vulnerabilities or potential impacts. Evaluations can include benchmarking, red-teaming and audits and can be conducted both before and after model deployment.\u201d<\/p>\n\n\n\n\n\n<p>Many of the aims of evaluating generative AI models and systems resemble those of evaluating traditional software, such as assessing performance and reliability. However, there is growing recognition that evaluating generative AI is more challenging than evaluating traditional machine learning systems. This is because generative AI systems accept a wide range of inputs, produce diverse outputs, support numerous use cases, and can have impacts on people and society that range from mundane to consequential. We explore these challenges in Part 1 of our 2025 white paper, <em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-from-other-domains-to-advance-ai-evaluation-and-testing\/\">Learning from Other Domains to Advance AI Evaluation and Testing<\/a><\/em>.<\/p>\n\n\n\n\n\n<p>Microsoft recognizes that governance is not a blank slate. Many other domains have long histories of managing complex, impactful technologies in high-stakes settings. By engaging experts from these domains, Microsoft aims to learn from the strengths and shortfalls of established governance and public policy strategies, adapting insights to the unique challenges of AI.&nbsp;<\/p>\n\n\n\n<p>This cross-domain learning has helped further shape Microsoft\u2019s approach to AI governance and contributions to public policy discussions.<\/p>\n\n\n\n\n\n<div style=\"height:60px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n","protected":false},"excerpt":{"rendered":"<p>Generative AI presents a unique challenge and opportunity to reexamine governance practices for the responsible development, deployment, and use of AI. To advance thinking in this space, Microsoft has tapped into the experience and knowledge of experts across domains\u2014from genome editing to cybersecurity\u2014to investigate the role of testing and evaluation as a governance tool. AI [&hellip;]<\/p>\n","protected":false},"featured_media":1141306,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-1141125","msr-story","type-msr-story","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us"],"related-researchers":[{"type":"user_nicename","display_name":"Kathleen Sullivan","user_id":40949,"people_section":"Section name 0","alias":"kasull"},{"type":"user_nicename","display_name":"Amanda Craig Deckard","user_id":43899,"people_section":"Section name 0","alias":"amcraig"}],"related-publications":[1147724],"related-downloads":[],"related-videos":[],"related-projects":[],"related-groups":[],"related-events":[],"related-posts":[1140810,1142130,1142208,1143099,1144381],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-story\/1141125","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-story"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-story"}],"version-history":[{"count":57,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-story\/1141125\/revisions"}],"predecessor-version":[{"id":1158595,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-story\/1141125\/revisions\/1158595"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1141306"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1141125"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1141125"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1141125"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1141125"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}