Dynamics Insights Apps AI

Business Applications Applied AI

Intro to Machine Learning in Customer Insights

Share this page

Authors:
Tommy Guy (opens in new tab), Sally Kellaway, Zachary Cook, Julie Koesmarno

Microsoft Dynamics 365 Customer Insights (opens in new tab) accelerates time to value with Machine Learning-based predictions covering Product recommendations, Churn risk, Sentiment analysis and Customer lifetime value scenarios. These features were developed using vast data sets and advanced analytics to provide a comprehensive and timely understanding of customers. The complexity of these artificial intelligence features also introduces new questions about the scope and meaning of the results they produce. This article introduces some key concepts in Customer Insights’ Machine Learning capabilities. Our goal is to demystify Machine Learning and help Customer Insights users gain more confidence in the predictions they can generate with these features.

Machine Learning is a Generalization of Patterns in Data
For sellers and marketers, growing a business requires a wealth of domain knowledge. This expertise is necessary to identify campaigns and activities that will help drive revenue. One activity might be to recommend products that will have a high purchase rate to recommend to customers in marketing campaigns. For instance, Cameron is a marketing specialist for Contoso Coffee. Cameron’s experience may indicate that people who buy a new grinder are likely to also buy whole bean coffee. Cameron has also found that these customers are less likely to purchase ground coffee in the future.

Simple rules like this are called heuristics. Heuristics don’t scale well because each rule created requires the domain expert (in this case, Cameron) to encode their personal experience and direct observations from the data into the rule. Human-generated heuristics may encode biases (e.g., Cameron might have a subconscious preference for whole bean coffee and may look for confirming evidence) and may also suffer from the human expert’s limited breadth of experience (e.g., Cameron might have only worked for Contoso for 3 months).

Nevertheless, formulating good human-generated heuristics and good Machine Learning generated predictions share the same methods for reducing bias:

  1. Cameron should check their intuition by assessing whether the patterns exist across longer time periods.
  2.  When Cameron recommends whole bean coffee, they understand why that recommendation was made by being able to provide an explanation to justify the recommendation.
  3. Cameron should continue to reassess and re-validate their intuition to make sure the heuristic still holds. For instance, a shift to WFH could change coffee consumption habits in ways       that are hard to predict.

What are the components of a Machine Learning Prediction?
Like all Machine Learning products, the product recommendation model (opens in new tab) offered in Customer Insights improves on human-generated heuristic rules. It has a few key components:

  1. A business objective, which is the thing we are trying to optimize. If we are trying to predict future purchases by customers, the objective is to identify products or services that have the highest chance of being purchased.
  2. An algorithm, which is the ‘program’ that will consume input data and make a prediction. Machine Learning algorithms contain variables that can be optimized during training to produce the most accurate predictions. The values of variables that optimize the predictions are called a model. For example, every Customer Insights product recommendation model starts with the same algorithm, but the model is trained to give optimal recommendations based on the product set and the purchase patterns of the dataset that the user inputs. We train machine learning models by “replaying history” to see if the model accurately predicts what happened in the data and update it to make better predictions. A recommendation algorithm may train a model by taking every customer’s purchase history and using all previously purchased items to predict the last item that each customer purchased.
  3. Machine Learning algorithms have features that describe aspects of the data. As an example, the product recommendation algorithm may create features by categorizing items in our catalogue (ground vs whole bean coffee, or beans vs grinders vs espresso machines). Other features may be manufacturers, colors, etc. Features are important for Machine Learning algorithms because they represent facts about input data that can be used to generate the predictions. Some features are constructed by the algorithm (as opposed to just extracted directly from the data). For instance, the Customer Lifetime Value prediction model computes the time between purchases from the total list of transactions because the frequency of purchase may be an important predictor of lifetime value in some scenarios.
Figure 1- High level workflow from business objective to model predictions

Figure 1- High level workflow from business objective to model predictions

How do Machine Learning Predictions differ from Human Generated Rules?
Different models have different components and details that can be tuned to help generate a wide array of business predictions. We’ll dive into these details in future blog posts and interviews. In this introduction, we’ll discuss two of the key differences between Machine Learning model and human generated rules.

  1. Machine Learning can consume vastly more data than a human can read and understand. This makes it impossible for a human to fully simulate every possible decision the model will make (e.g., it is impossible to list all possible categories of items that customers may have previously purchased for all of time). In fact, good models can generalize well to inputs that never occurred in the training data. In Contoso Coffee’s transaction history, no human has ever bought a specific coffee grinder and French press together, but our product recommendation algorithm should be able to generalize that the user may be interested in an electric kettle. However, if we cannot “check the model’s homework”, this can make it more difficult to know if we trust a Machine Learning model. Below, we discuss how Customer Insights validates our prediction models.
  2. Machine Learning algorithms are only successful if the dataset contains features that describe the incoming data. In the example above, Cameron used their past knowledge to recommend whole bean coffee instead of ground coffee. Cameron implicitly used features of those products: they understood that coffee grinders are only useful when the coffee isn’t already ground. Unless we include useful features like product category in our training data, the Machine Learning algorithm can’t use that ‘fact’ to learn the best model. Unless “ground vs whole” is a feature on bean products, a model couldn’t learn to suggest one over the other. Our predictions will only be as good as the data used to train the model.

Is My Model Trustworthy?
Customer Insights applies several techniques to help you trust a model in the process of making business decisions. This section describes some of the quality and performance testing that Machine Learning engineers conduct when they build a new model. Stay tuned for more details about them in subsequent posts.

Is my model “overfitting” to the training data?
Overfitting occurs when a Machine Learning model works well on its training data but produces poor results when it is applied to other data. Consider the 8 user purchase logs below. Three customers buy a kettle and of those, two buy a French Press. It’s reasonable to suggest a French Press to anyone who buys a kettle. But One customer (Purchase 5) bought a coffee machine and a kettle. If the model recommends a kettle for Purchase 1, it’s likely overfitting based on one observation.

Figure 2- 8 purchase logs from Contoso Coffee transactions.

Figure 2- 8 purchase logs from Contoso Coffee transactions.

Customer Insights uses an automated procedure called cross validation (opens in new tab) to detect overfitting during training. During model training, the algorithm divides the training dataset into 10 subsets. Then, it trains 10 models that each use 9/10 of the data. Each model is used to predict data in the remaining (1/10) data, also called the holdout set. If the predictions on the holdout set are accurate then we know we can trust the final model to work on new data in the future. If the algorithm overfits, then Machine Learning engineers will modify the algorithm or its features to generalize more effectively.

How do you know what a model uses to make decisions ?
Cameron can clearly explain why they recommend whole bean coffee to grinder buyers. It can be more challenging to understand what feature a Machine Learning model uses to make predictions.

Machine Learning engineers talk about feature importance when explaining how a model works. A feature’s importance is the amount of improvement in the model’s accuracy that we gain from including that one feature. So, a feature has no importance if removing it from the data set has no impact on the model’s decisions. A feature has high importance if removing it from the data makes the model significantly worse.

Figure 3- Feature importance table that is available with Business-to-Business Churn prediction. High impact features increase the risk of a business account churning, with the % impact listed as the importance of that one feature.

Figure 3- Feature importance table that is available with Business-to-Business Churn prediction. High impact features increase the risk of a business account churning, with the % impact listed as the importance of that one feature.

The Customer Insights Business-to-Business Churn model generates detailed information about the importance of features used to generate its predictions. In this example, features like Customer Service support activities were important in indicating high churn, as well as customer traits like what city the customer lives in. Stay tuned for a deeper dive in to feature importance in future blog posts.

Machine Learning in Customer Insights
In Customer Insights, you can use Machine Learning to identify advanced patterns across comprehensive data sets. It’s important to understand how those patterns emerge: algorithms examine historical data to identify patterns that can be used on new data in the future. Customer Insights predictions features are built with industry best practices to ensure that models are accurate and trustworthy. You can help improve the accuracy of the predictions you use them for by providing training data with features that help the model identify trends and patterns for your business.

In future posts, we will share deep dives about how we create accurate and trustworthy Machine Learning models in Customer Insights and about how you can help improve the predictions you generate with those models.