{"id":878484,"date":"2022-09-19T09:00:00","date_gmt":"2022-09-19T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=878484"},"modified":"2022-09-16T13:20:05","modified_gmt":"2022-09-16T20:20:05","slug":"ai-models-vs-ai-systems-understanding-units-of-performance-assessment","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/ai-models-vs-ai-systems-understanding-units-of-performance-assessment\/","title":{"rendered":"AI Models vs. AI Systems: Understanding Units of Performance Assessment"},"content":{"rendered":"\n
As AI becomes more deeply integrated into every aspect of our lives, it is essential that AI systems perform appropriately for their intended use. We know AI models can never be perfect, so how do we decide when AI performance is \u2018good enough\u2019 <\/strong>for use in a real life application? Is level of accuracy<\/em> a sufficient gauge? What else matters? These are questions Microsoft Research tackles every day as part of our mission to follow a responsible, human-centered approach to building and deploying future-looking AI systems.<\/p>\n\n\n\n To answer the question, \u201cwhat is good enough?\u201d, it becomes necessary to distinguish between an AI model and an AI system as the unit of performance assessment<\/strong>. An AI model typically involves some input data, a pattern-matching algorithm, and an output classification. For example, a radiology scan of the patient\u2019s chest might be shown to an AI model to predict whether a patient has COVID-19. An AI system, by contrast, would evaluate a broader range of information about the patient, beyond the COVID-19 prediction, to inform a clinical decision and treatment plan.<\/p>\n\n\n\n Research has shown that human-AI collaboration can increase the accuracy<\/strong> of AI models alone (reference<\/a>). In this blog, we share key learnings from the recently retired Project Talia<\/a>, the prior collaboration between Microsoft Research and SilverCloud Health to understand how thinking about the AI system as a whole\u2014beyond the AI model\u2014can help to more precisely define and enumerate \u2018good enough\u2019 for real-life application.<\/p>\n\n\n\n In Project Talia, we developed two AI models to predict treatment outcomes<\/strong> for patients receiving human-supported, internet-delivered cognitive behavioral treatment (iCBT) for symptoms of depression and anxiety. These AI models have the potential to assist the work practices of iCBT coaches. These iCBT coaches are practicing behavioral health professionals specifically trained to guide patients on the use of the treatment platform, recommend specific treatments, and help the patient work through identified difficulties.<\/p>\n\n\n\n Project Talia offers an illustration of the distinction between the AI model produced during research and a resulting AI system that could potentially get implemented to support real-life patient treatment. In this scenario, we demonstrate every system element that must be considered to ensure effective system outcomes, not just AI model outcomes.<\/p><\/blockquote>\n\n\n\n SilverCloud Health (acquired by Amwell in 2021) is an evidence-based, digital, on-demand mental health platform that delivers iCBT-based programs to patients in combination with limited but regular contact from the iCBT coach. The platform offers more than thirty iCBT programs, predominantly for treating mild-to-moderate symptoms of depression, anxiety, and stress.<\/p>\n\n\n\n Patients work through the program(s) independently and with regular assistance from the iCBT coach, who provides guidance and encouragement through weekly reviews and feedback on the treatment journey.<\/p>\n\n\n\n Previous research (reference<\/a>) has shown that involving a human coach within iCBT leads to more effective treatment outcomes for patients than unsupported interventions. Aiming to maximize the effects and outcomes of human support in this format, AI models were developed to dynamically predict the likelihood of a patient achieving a reliable improvement[1]<\/a> in their depression and anxiety symptoms by the end of the treatment program (typically 8 to 14 weeks in length).<\/p>\n\n\n\n Existing literature on feedback-informed therapy (reference<\/a>) and Project Talia research (reference<\/a>) suggest that having access to these predictions could provide reassurance for those patients \u2018on track\u2019 toward achieving a positive outcome from treatment, or prompt iCBT coaches to make appropriate adjustments therein to better meet those patients\u2019 needs.<\/p>\n\n\n\nProject Talia: Improving Mental Health Outcomes<\/h2>\n\n\n\n
AI Model vs. AI System<\/h2>\n\n\n\n