Three visualization types for model evaluation are output by the popular and publicly available InterpretML implementation of GAMs (top) and the implementation of SHAP in the SHAP Python package (bottom), respectively. Left column: global explanations. Middle column: component (GAMs) or dependence plot (SHAP). Right column: local explanations.<\/p><\/div>\n
Tools in practice<\/h3>\n
The study focuses on two popular and publicly available tools, each representative of one of two techniques dominating the space: the InterpretML implementation of GAMs, which uses a \u201cglassbox model\u201d approach, by which models are designed to be simple enough to understand, and the implementation of SHAP in the SHAP Python package, which uses a post-hoc explanation approach for complex models. Each tool outputs three visualization types for model evaluation.<\/p>\n
Through pilot interviews with practitioners, the researchers identified six routine challenges that data scientists face in their day-to-day work. The researchers then set up an interview study in which they placed data scientists in context with data, a model, and one of the two tools, assigned randomly. They examined how well 11 practitioners were able to use the interpretability tool to uncover and address the routine challenges.<\/p>\n
<\/p>\n
The researchers found participants lacked an overall understanding of the tools, particularly in reading and drawing conclusions from the visualizations, which contained importance scores and other values that weren\u2019t explicitly explained, causing confusion. Despite this, the researchers observed, participants were inclined to trust the tools. Some came to rely on the visualizations to justify questionable outputs\u2014the existence of the visualizations offering enough proof of the tools\u2019 credibility\u2014as opposed to using them to scrutinize model performance. The tools\u2019 public availability and widespread use also contributed to participants\u2019 confidence in the tools, with one participant pointing to its availability as an indication that it \u201cmust be doing something right.\u201d<\/p>\n
Following the interview study, the researchers surveyed nearly 200 practitioners, who were asked to participate in an adjusted version of the interview study task. The purpose was to scale up the findings and gain a sense of their overall perception and use of the tools. The survey largely supported participants\u2019 difficulty in understanding the visualizations and their superficial use of them found in the interview study, but also revealed a path for future work around tutorials and interactive features to support practitioners in using the tools.<\/p>\n
\u201cOur next step is to explore ways of helping data scientists form the right mental models so that they can take advantage of the full potential of these tools,\u201d says Wortman Vaughan.<\/p>\n
The researchers conclude that as the interpretability landscape continues to evolve, studies of the extent to which interpretability tools are achieving their intended goals and practitioners\u2019 use and perception of them will continue to be important in improving the tools themselves and supporting practitioners in productively using them.<\/p>\n
Putting people first<\/h3>\n
Fairness and interpretability aren\u2019t static, objective concepts. Because their definitions hinge on people and their unique circumstances, fairness and interpretability will always be changing. For Wallach and Wortman Vaughan, being responsible creators of AI begins and ends with people, with the who: Who is building the AI systems? Who do these systems take power from and give power to? Who is using these systems and why? In their fairness checklist and interpretability tools papers, they and their co-authors look specifically at those developing AI systems, determining that practitioners need to be involved in the development of the tools and resources designed to help them in their work.<\/p>\n
By putting people first, Wallach and Wortman Vaughan contribute to a support network that includes resources and also<\/em> reinforcements for using those resources, whether that be in the form of a community of likeminded individuals like in WiML, a comprehensive checklist for sparking dialogue that will hopefully result in more trustworthy systems, or feedback from teams on the ground to help ensure tools deliver on their promise of helping to make responsible AI achievable.<\/p>\n","protected":false},"excerpt":{"rendered":"At the 2005 Conference on Neural Information Processing Systems, researcher Hanna Wallach found herself in a unique position\u2014sharing a hotel room with another woman. Actually, three other women to be exact. In the previous years she had attended, that had never been an option because she didn\u2019t really know any other women in machine learning. […]<\/p>\n","protected":false},"author":38838,"featured_media":660567,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-659955","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[372368],"related-projects":[],"related-events":[641571],"related-researchers":[{"type":"user_nicename","value":"Harsha Nori","user_id":41461,"display_name":"Harsha Nori","author_link":"Harsha Nori<\/a>","is_active":false,"last_first":"Nori, Harsha","people_section":0,"alias":"hanori"}],"msr_type":"Post","featured_image_thumbnail":"","byline":"","formattedDate":"May 19, 2020","formattedExcerpt":"At the 2005 Conference on Neural Information Processing Systems, researcher Hanna Wallach found herself in a unique position\u2014sharing a hotel room with another woman. Actually, three other women to be exact. In the previous years she had attended, that had never been an option because…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/659955"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=659955"}],"version-history":[{"count":13,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/659955\/revisions"}],"predecessor-version":[{"id":929022,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/659955\/revisions\/929022"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/660567"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=659955"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=659955"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=659955"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=659955"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=659955"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=659955"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=659955"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=659955"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=659955"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=659955"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=659955"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}