{"id":563946,"date":"2019-01-25T16:37:23","date_gmt":"2019-01-26T00:37:23","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=563946"},"modified":"2019-01-25T16:36:34","modified_gmt":"2019-01-26T00:36:34","slug":"creating-better-ai-partners-a-case-for-backward-compatibility","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/creating-better-ai-partners-a-case-for-backward-compatibility\/","title":{"rendered":"Creating better AI partners: A case for backward compatibility"},"content":{"rendered":"
<\/p>\n
Artificial intelligence technologies hold great promise as partners in the real world. They\u2019re in the early stages of helping doctors administer care to their patients and lenders determine the risk associated with loan applications, among other examples. But what happens when these systems that users have come to understand and employ in ways that will enhance their work are updated? Sure, we can assume an improvement in accuracy or speed on the part of the agent, a seemingly beneficial change. However, current practices for updating the models used to power AI partners don\u2019t account for how practitioners have learned over time to trust and make use of an agent\u2019s contributions.<\/p>\n
Our team\u2014which also includes graduate student Gagan Bansal; (opens in new tab)<\/span><\/a>Microsoft Technical Fellow Eric Horvitz (opens in new tab)<\/span><\/a>; University of Washington professor Daniel S. Weld (opens in new tab)<\/span><\/a>; and University of Michigan assistant professor Walter S. Lasecki (opens in new tab)<\/span><\/a>\u2014focuses on this crucial step in the life cycle of machine learning models. In the work we\u2019re presenting (opens in new tab)<\/span><\/a> next week at the Association for the Advancement of Artificial Intelligence\u2019s annual conference (AAAI 2019) (opens in new tab)<\/span><\/a>, we introduce a platform, openly available on GitHub, to help better understand the human-AI dynamics in these types of settings (opens in new tab)<\/span><\/a>.<\/p>\n Updates are usually fueled and motivated either by additional training data or by algorithmic and optimization advances. Currently, an upgrade to an AI system is informed by improvements in model performance alone, often measured in terms of empirical accuracy on benchmark datasets. These traditional metrics on performance of the AI component are not sufficient when the AI technology is used by people to accomplish tasks. Since models sometimes make mistakes, the success of human-AI teams in decision-making relies on the human partner to create a mental model and to learn when to trust the machine so that he or she can successfully decide whether to override a decision. We show in the research we\u2019ll be sharing at AAAI that updates that are not optimized for human-AI teams can cause significant disruptions in the collaboration by violating human trust.<\/p>\n Imagine a doctor using a diagnosis model she has found to be most helpful in cases involving her older patients. Let\u2019s say it\u2019s been 95 percent accurate. After an update, the model sees an overall increase in accuracy for all patients, to 98 percent, but\u2014unbeknownst to the doctor\u2014it introduces new mistakes that lead to poorer performance when applied to older patients. Even though the model has improved, the doctor may take a wrong action, leading to a lower team performance. In fact, through human studies we present in the paper, we show that an update to a more accurate machine-learned model that is incompatible with the mental model of the human user\u2014that is, an updated model making errors on specific cases the previous version was getting right\u2014can hurt team performance instead of improving it. This empirical result provides evidence for undertaking a more comprehensive optimization\u2014one that considers the performance of the human-AI team rather than the performance of only the AI component.<\/p>\n