{"id":822961,"date":"2022-03-01T15:23:15","date_gmt":"2022-03-01T23:23:15","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=822961"},"modified":"2022-03-15T10:23:06","modified_gmt":"2022-03-15T17:23:06","slug":"explainability","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/explainability\/","title":{"rendered":"Explainability"},"content":{"rendered":"<h4>Authors:<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/algutier\/\" target=\"_blank\" rel=\"noopener\">Alejandro Gutierrez Munoz<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/riguy\/\" target=\"_blank\" rel=\"noopener\">Tommy Guy,<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> Sally Kellaway<\/h4>\n<h3><strong>Trust and understanding of AI Models\u2019 predictions through Customer Insights<\/strong><\/h3>\n<p>AI models are becoming a normal part of many business operations, led by advancement in AI technologies and the democratization of AI. While AI is increasingly important in decision making, it can be challenging to understand what influences the outcomes of AI models. Critical details like the information used as input, the influence of missing data, and use of unintended or sensitive input variables can all have an impact on a model\u2019s output. To use AI responsibly and to trust it enough to make decisions, we must have tools and processes in place to understand how the model is reaching its conclusions.<\/p>\n<p>Microsoft Dynamics 365 Customer Insights goes beyond just a predicted outcome and provides additional information that helps better understand the model and its predictions. Using the latest AI technologies, Customer Insights surfaces the main factors that drive our predictions. In this blog post, we will talk about how Customer Insights\u2019 out-of-the-box AI models enable enterprises to better understand and trust AI models, as well as what actions can be taken based on the additional model interpretability.<\/p>\n<div id=\"attachment_822964\" style=\"width: 1219px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-822964\" class=\"wp-image-822964 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/CLV.jpg\" alt=\"Explainability information on the results page of the Customer Lifetime Value Out of box model, designed to help you interpret model results.\" width=\"1209\" height=\"885\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/CLV.jpg 1209w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/CLV-300x220.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/CLV-1024x750.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/CLV-768x562.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/CLV-80x60.jpg 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/CLV-240x176.jpg 240w\" sizes=\"auto, (max-width: 1209px) 100vw, 1209px\" \/><p id=\"caption-attachment-822964\" class=\"wp-caption-text\"><em>Figure 1: Explainability information on the results page of the Customer Lifetime Value Out of box model, designed to help you interpret model results.<\/em><\/p><\/div>\n<h3><strong>What is model interpretability and why is it important?<\/strong><\/h3>\n<p>AI models are sometimes described as black boxes that consume information and output a prediction \u2013 where the inner workings are unknown. This raises serious questions about our reliance on AI technology. Can the model\u2019s prediction be trusted? Does the prediction make sense? AI model interpretability has emerged over the last few years as an area of research with the goal of providing insights into how AI models reach decisions.<\/p>\n<p>AI models leverage information from the enterprise (data about customers, transactions, historic data, etc.) as inputs. We call these inputs <strong>features<\/strong>. Features are used by the model to determine the output. A way to achieve model interpretability is by using explainable AI, or model explainability, which are a set of techniques that describe which features influence a prediction. We\u2019ll talk about two approaches: local explainability that describes how the model arrived at a single prediction (say a single customer\u2019s churn score) and global explainability that describes which features are most useful to make all predictions. Before we describe how a model produces explainability output and how you should interpret it, we need to describe how we construct features from input data.<\/p>\n<h3><strong>AI Feature Design with Interpretability in mind<\/strong><\/h3>\n<p>AI models are trained using features, which are transformations of raw input data to make it easier for the model to use. These transformations are a standard part of the model development process.<\/p>\n<p>For instance, input data may be a list of transactions with dollar amounts, but a feature might be the number of transactions in the last thirty days and the average transaction value. (Many features summarize more than one input row.) Before features are created, raw input data needs to be prepared and \u201ccleaned\u201d. In a future post, we\u2019ll deep dive on data preparation and the role that model explainability plays in it.<\/p>\n<p>To provide a more concrete example of what a feature is and how they might be important to the model\u2019s prediction, take these two features that might help predict customer churn value: <em>frequency of transactions<\/em> and <em>number of product types<\/em> <em>bought<\/em>. In a coffee shop, <em>frequency of transactions<\/em> is likely a great predictor of continued patronage: the regulars who walk by every morning will likely continue to do so. But those regulars may always get the same thing: I always get a 12 oz black Americano and never get a mochaccino or a sandwich. That means that <em>number of product types<\/em> I buy isn\u2019t a good predictor of my churn: I buy the same product, but I get it every morning.<\/p>\n<p>Conversely, the bank down the road may observe that I rarely visit the branch to transact. However, I\u2019ve got a mortgage, two bank accounts and a credit card with that bank. The bank\u2019s churn predictions might rely on the number of products\/services bought rather than frequency of buying a new product. Both models start with the same set of facts (<em>frequency of transactions<\/em> and <em>number of product types<\/em>) and predict the same thing (churn) but have learned to use different features to make accurate predictions. Model authors created a pair of features that might be useful, but the model ultimately decides how or whether to use those features based on the context.<\/p>\n<p>Feature design also requires understandable names for the features. If a user doesn\u2019t know what a feature means, then it\u2019s hard to understand what it means if the model thinks it\u2019s important! During feature construction, AI engineers work with Product Managers and Content Writers to create human-readable names for every feature. For example, a feature representing the average number of transactions for a customer in the last quarter could look something like \u2018avg_trans_last_3_months\u2019 in the data science experimentation environment. If we were to present features like this to business users, it could be difficult for them to understand exactly what that means.<\/p>\n<h3><strong>Explainability via Game Theory<\/strong><\/h3>\n<p>A main goal in model explainability is to understand the impact of including a feature in a model. For instance, one could train a model with all the features except one, then train a model with all features. The difference in accuracy of model predictions is a measure of the importance of the feature that was left out. If the model with the feature is much more accurate than the model without the feature, then the feature was very important.<\/p>\n<div id=\"attachment_823162\" style=\"width: 910px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-823162\" class=\"wp-image-823162\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_FeatureImportance-632x1024.jpg\" alt=\"The basic idea to compute explainability is to understand each feature\u2019s contribution to the model\u2019s performance by comparing performance of the whole model to performance without the feature. In reality, we use Shapley values to identify each feature\u2019s contribution, including interactions, in one training cycle.\" width=\"900\" height=\"1458\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_FeatureImportance-632x1024.jpg 632w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_FeatureImportance-185x300.jpg 185w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_FeatureImportance-768x1245.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_FeatureImportance-948x1536.jpg 948w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_FeatureImportance-1264x2048.jpg 1264w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_FeatureImportance-111x180.jpg 111w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_FeatureImportance-scaled.jpg 1580w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><p id=\"caption-attachment-823162\" class=\"wp-caption-text\"><em>Figure 2: The basic idea to compute explainability is to understand each feature\u2019s contribution to the model\u2019s performance by comparing performance of the whole model to performance without the feature. In reality, we use Shapley values to identify each feature\u2019s contribution, including interactions, in one training cycle.<\/em><\/p><\/div>\n<p>There are nuances related to feature interaction (e.g., including city name and zip code may be redundant: removing one won\u2019t impact model performance but removing both would) but the basic idea remains the same: how much does including a feature contribute to model performance?<\/p>\n<p>With hundreds of features, it\u2019s too expensive to train a model leaving each feature out one by one. Instead, we use a concept called <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/en.wikipedia.org\/wiki\/Shapley_value\" target=\"_blank\" rel=\"noopener\">Shapley values<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> to identify feature contributions from a single training cycle. Shapley values are a technique from game theory, where the goal is to understand the gains and costs of several actors working in a coalition. In Machine Learning, the \u201cactors\u201d are features, and the Shapley Value algorithm can estimate each feature\u2019s contribution even when they interact with other features.<\/p>\n<p>If you are looking for (much!) more detail about Shapley analysis, a good place to start is this GitHub repository: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/slundberg\/shap\" target=\"_blank\" rel=\"noopener\">GitHub &#8211; slundberg\/shap: A game theoretic approach to explain the output of any machine learning model.<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<div id=\"attachment_823195\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-823195\" class=\"wp-image-823195 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Fed2022_SHAPExplainability-1024x335.jpg\" alt=\"Shap Contributions to model\u2019s prediction\" width=\"1024\" height=\"335\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Fed2022_SHAPExplainability-1024x335.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Fed2022_SHAPExplainability-300x98.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Fed2022_SHAPExplainability-768x251.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Fed2022_SHAPExplainability-1536x502.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Fed2022_SHAPExplainability-2048x669.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Fed2022_SHAPExplainability-240x78.jpg 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-823195\" class=\"wp-caption-text\">Figure 3: Shap Contributions to model\u2019s prediction<\/p><\/div>\n<p>Other types of models, like deep learning neural networks, require novel methods to discover the features contributions. Customer Insights\u2019 sentiment model uses a deep learning transformer model that uses thousands of features. To explain the impact of each feature we leverage a technique known as<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/captum.ai\/docs\/extension\/integrated_gradients\" target=\"_blank\" rel=\"noopener\"> integrated gradients.<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> Most deep learning models are implemented using neural networks, which learn by fine-tuning weights of the connections between the neurons in the network. Integrated gradients evaluate these connections to explain how different inputs influence the results. This lets us measure which words in a sentence have the highest contribution to the final sentiment score.<\/p>\n<div id=\"attachment_822982\" style=\"width: 634px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-822982\" class=\"wp-image-822982 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/TopPositiveWords.png\" alt=\"Record level explainability information generated by the Sentiment analysis model.\" width=\"624\" height=\"224\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/TopPositiveWords.png 624w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/TopPositiveWords-300x108.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/TopPositiveWords-240x86.png 240w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><p id=\"caption-attachment-822982\" class=\"wp-caption-text\"><em>Figure 4: Model level explainability information generated for the Sentiment Analysis model.<\/em><\/p><\/div>\n<h6 style=\"text-align: center;\"><\/h6>\n<p>&nbsp;<\/p>\n<div id=\"attachment_822970\" style=\"width: 317px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-822970\" class=\"wp-image-822970 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/feedbackSample.png\" alt=\"Record level explainability information generated by the Sentiment analysis model.\" width=\"307\" height=\"272\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/feedbackSample.png 307w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/feedbackSample-300x266.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/feedbackSample-203x180.png 203w\" sizes=\"auto, (max-width: 307px) 100vw, 307px\" \/><p id=\"caption-attachment-822970\" class=\"wp-caption-text\"><em>Figure 5:\u00a0 Record level explainability information generated by the Sentiment analysis model.<\/em><\/p><\/div>\n<h3><strong>How to leverage the interpretability of a model<\/strong><\/h3>\n<p>AI models will output predictions for each record. A record is an instance or sample of the set we want to predict a score for. For example, for a churn model in Customer Insights, each customer is a record to score. Explainability is first computed at the record level (local explainability), meaning we compute the impact of each feature on predicting the output for a single record. If we are interested in a particular set of records (e.g., I have a specific set of customer accounts I manage), or just a few examples to validate our intuitions as to what features might be important to the model, looking at local explainability makes sense. When we are interested in the main features across all scored records, we need to see the aggregated impact for each record, or its global explainability.<\/p>\n<div id=\"attachment_823207\" style=\"width: 1034px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-823207\" class=\"wp-image-823207 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_ModellevelChurn-1024x611.jpg\" alt=\"Global explainability example from the Churn model.\" width=\"1024\" height=\"611\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_ModellevelChurn-1024x611.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_ModellevelChurn-300x179.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_ModellevelChurn-768x458.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_ModellevelChurn-1536x917.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_ModellevelChurn-2048x1222.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Feb2022_Explainability_ModellevelChurn-240x143.jpg 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-823207\" class=\"wp-caption-text\"><em>Figure 6:\u00a0 Global explainability example from the Churn model.<\/em><\/p><\/div>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6 style=\"text-align: left;\"><\/h6>\n<p>Features can impact the score in a positive way or negative way. For instance, a high value on number of support interactions might make a customer more likely to churn by 13%, while more transactions per week might make the customer less likely to churn by 5%. In these cases, a high numerical value for each feature (support calls or transactions per week) has opposing effects on the churn outcome. Feature impact therefore needs to consider both magnitude (size of impact) and directionality (positive or negative).<\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_823213\" style=\"width: 1034px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-823213\" class=\"wp-image-823213 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Figure6-1024x630.png\" alt=\"Local explainability example for the Business-to-Business Churn model.\" width=\"1024\" height=\"630\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Figure6-1024x630.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Figure6-300x185.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Figure6-768x473.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Figure6-240x148.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/03\/Figure6.png 1321w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-823213\" class=\"wp-caption-text\"><em>Figure 7: Local explainability example for the Business-to-Business Churn model.<\/em><\/p><\/div>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6><\/h6>\n<h6 style=\"text-align: left;\"><\/h6>\n<h3><strong>Acting on explainability information<\/strong><\/h3>\n<p>Now that we have made the case for adding explainability as an important output of our AI models, the question is what do I do with this information? For model creators, adding explainability as part of feature design and model debugging is a very powerful tool, as it can highlight data issues from ingestion, clean-up process, transformations, etc. It also helps validate the behavior of the model early on: does the way the model make predictions pass a \u201csniff test\u201d where obviously important features are important in the model? For consumers of AI models, it helps validate their assumptions about what should be important to the model. It can also help inform you of particular trends and patterns to pay attention to in your customer base to inform next steps.<\/p>\n<p>Explainability is an integral part of providing more transparency to AI models, how they work, and why they make a particular prediction. Transparency is one of the core principles of Responsible AI, which we dive into more detail in a future blog post.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>ML models can be a \u201cblack box\u201d that make decisions in ways that we don\u2019t understand. Model Explainability is a feature that helps practitioners&#8217; probe how a model makes decisions so they can have more confidence in the results. <\/p>\n","protected":false},"author":41161,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":804652,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-822961","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":804652,"type":"group"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/822961","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/41161"}],"version-history":[{"count":30,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/822961\/revisions"}],"predecessor-version":[{"id":823582,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/822961\/revisions\/823582"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=822961"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=822961"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=822961"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=822961"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}