{"id":10578,"date":"2018-05-31T16:27:23","date_gmt":"2018-05-31T23:27:23","guid":{"rendered":"https:\/\/www.microsoft.com\/insidetrack\/blog\/?p=10578"},"modified":"2023-06-15T14:54:43","modified_gmt":"2023-06-15T21:54:43","slug":"applying-the-power-of-azure-machine-learning-to-improve-sap-incident-management","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/insidetrack\/blog\/applying-the-power-of-azure-machine-learning-to-improve-sap-incident-management\/","title":{"rendered":"Applying the power of Azure Machine Learning to improve SAP incident management"},"content":{"rendered":"
This content has been archived, and while it was correct at time of publication, it may no longer be accurate or reflect the current situation at Microsoft.<\/p>\n<\/div>\n<\/div>\n
One important aspect of digital transformation is embracing modern technologies and processes that can improve the customer experience. Microsoft found a perfect opportunity to do this\u2014we used Azure Machine Learning and AI to automate the triage component of our SAP incident management process. Our solution reduced the mean time to resolve SAP user issues, increased incident routing accuracy to 99 percent, and freed staff to focus on more strategic aspects of their roles.<\/p>\n
As enterprises continue to move into the digital world, an ever-increasing portion of their operations can benefit from leveraging the cloud, thus helping to improve scalability, enable a mobile workforce, and reduce data storage costs. But what about the human factor? As a technical decision maker, have you considered how going digital can also drive better customer service?<\/p>\n
At Microsoft, we\u2019re continuing our digital transformation journey, where our IT and product teams regularly collaborate to identify and solve challenges that exist within the enterprise. One example is the recent joint initiative of Microsoft Core Services Engineering and Operations (CSEO) and the Azure product team: we incorporated AI and machine learning (ML) technologies into our SAP incident management process to improve support ticket routing accuracy and significantly reduce incident resolution time.<\/p>\n
Our Operations organization at Microsoft continuously searches for ways to make processes more efficient and to improve the user experience by providing self-service solutions or implementing self-correcting routines that prevent emerging issues before they occur. We\u2019ve learned that the more complex the process, the bigger the challenge\u2014and, when we build a successful solution, the greater the reward.<\/p>\n
SAP incident management is one such example of a process we identified for improvement. Supporting our SAP users requires a wide variety of domain-specific knowledge. Our SAP Support personnel are divided among several teams, each specializing in a particular functional or technical area. The sheer scale of our company\u2014Microsoft has more than 125,000 employees, plus customers, vendors, and partners who all touch an SAP system at some point in the course of doing business\u2014means that our SAP Support teams handle thousands of incidents each month.<\/p>\n
Traditionally, a user\u2019s SAP incident would be first triaged by a support staff member to determine to which of five different SAP support groups the incident should be routed: SAP Technical, SAP Human Capital Management, SAP Supply Chain Management, SAP Business Intelligence, or SAP Finance & Master Data Governance. The incident would then be placed in that support team\u2019s queue to be resolved. As part of our ongoing efforts to improve SAP processes, when we reviewed incident management processes and operations with the SAP Support staff, we discovered an inefficiency in this incident-routing process. As illustrated in Figure 1, for email requests sent to SAP Support, our analysis revealed an average 30-minute delay between the time a new incident first landed in the assignment group queue and when a staff member assigned it to the appropriate SAP Support team\u2019s queue.<\/p>\n
This delay impacted our mean time to resolve (MTTR) values and was affecting our internal customers\u2019 user experience. One potential approach to address this issue could have been to provide additional personnel training that emphasized the importance of becoming more efficient at triage. However, we saw this issue as an excellent opportunity to automate the triage component by using AI and Azure Machine Learning.<\/p>\n
<\/p>\n
Machine learning is particularly good at prediction, classification, and anomaly detection, and incident routing is all about classification. Therefore, incorporating Azure Machine Learning into our SAP incident routing was a natural fit\u2014especially when considering these criteria:<\/p>\n
After we decided to solve the triage time delay by using AI, we engaged our Microsoft Data Science team to design the AI model. In this case, however, our data scientists decided to approach the solution using two different methodologies, as detailed below.<\/p>\n
The key assumption in this approach was that we could repurpose the\u00a0data dictionaries<\/i>\u2014lists of keywords used for human triage and for routing incidents to each of the five SAP Support groups. Building a model using this type of approach required an existing dataset that included a data dictionary containing all information we needed to extract\u2014and in this situation, we had exactly that. We provided the data scientists the existing data dictionaries for each of the five routing queues.<\/p>\n
The following steps summarize the process that our data scientists used to develop the knowledge base model.<\/p>\n
The key assumption in this alternate approach was that we didn\u2019t have any data dictionary for each SAP incident queue. Although the data scientists had the same dataset to use as with the previous method, that was their only source of information. In this situation, AI had to be used to build the model.<\/p>\n
The following steps summarize the process the data scientists used to develop this AI model.<\/p>\n
We now had two viable models that we could apply to automate the triage component of the SAP incident management process. Although the AI model demonstrated a significantly higher accuracy rate than the knowledge base model (85 percent versus 59.3 percent), we recognized the effort that had been put into developing each approach. Could there be additional value in taking the learnings from both initial models and creating a new integrated model? What accuracy rate would we achieve from the synergy of the two?<\/p>\n
To explore this possibility, the data scientists generated features from the knowledge base model using similarity matching and generated features from the AI model using hash functions. These features were then combined and run through Azure Machine Learning to generate the integrated model which, when tested with the out-of-sample data, resulted in an even higher 93 percent accuracy rate\u2014significantly better than either previous model had achieved on its own.<\/p>\n
After we selected the integrated model to use in our automation, we began a three-phase implementation process to validate that the model would perform as expected in a production environment.<\/p>\n
In the first phase, we spent two weeks running the integrated model against every new SAP incident that came in from an email request. This initial process only updated the incident\u2019s notes to indicate the AI\u2019s routing decision\u2014the automated system wasn\u2019t controlling the routing yet. After each incident was closed, we examined the incident\u2019s notes to compare the SAP group that the automated system selected to where the SAP support desk personnel actually routed the incident. What we found during this phase was that the accuracy was even higher than our expectation, achieving a remarkable 98.8 percent.<\/p>\n
This level of accuracy was much higher than the minimum 80 percent accuracy rate that we had established at the onset of the project that would signal its readiness to move into production. Why set an 80 percent success threshold? At that level, we determined that the solution would clearly improve overall MTTR because the 30 minutes saved for each of the correctly routed incidents would more than make up for any additional time support staff would spend to reroute the 20 percent of incidents that were incorrectly routed.<\/p>\n
With the latest accuracy rating at 98.8 percent, we were confident that the integrated model could make significant improvements to our production incident management system.<\/p>\n
Putting our automated triage solution into production for email-initiated support requests was a simple process of changing how incoming email requests were routed. Instead of being directed to the triage queue where the incident would await human review, email-based incidents would now be sent through the new automated process. The integrated model would scan the content for keywords, add a tag that identified the incident as being routed by AI, and then assign the ticket to the appropriate SAP group for remediation.<\/p>\n
As our integrated model went live on our production environment, we wanted to continue to monitor whether the system was routing each incident correctly. To do so, we communicated with the SAP Support staff, asking them to take the following steps when they saw any incorrectly routed incident that was tagged as coming from AI:<\/p>\n
We used this data to help retrain our algorithm, improve its accuracy, and measure the solution\u2019s efficacy. To date, we\u2019re experiencing a greater than 99 percent accuracy rate. Moreover, the average 30-minute delay that occurred in the human-powered triage model has been reduced to approximately one second\u2014a huge performance increase that is helping reduce our MTTR for email-based incidents.<\/p>\n
Up until this point, the implementation of our AI-based integrated model had focused on a single input modality: email-based SAP Support requests. However, CSEO offers several different support modalities to empower people to connect with technical support by whatever means is most suitable for them, such as filling out an online support request, calling technical support on the phone or through Skype, among others.<\/p>\n
All these non-email-based inputs use a web-based form to initiate the support request. Examples include a person using chat or phone to contact support will have their issue entered into a web-based form by the support agent, or filling out an online request for support by directly entering details into the web form, and so on.<\/p>\n
In the original incident management system, every web-based support form would set the routing field to the default support queue. This meant that unless the user changed the field, the incident would sit for an average of 30 minutes until a support person triaged it and directed it to the appropriate SAP group. As illustrated in Figure 2, we saw an opportunity to improve the user experience and improve routing of these incidents by replacing this default routing queue entry with input from the integrated model.<\/p>\n
We achieved this by:<\/p>\n
We have now fully implemented this AI solution across our entire SAP incident management system, which handles an average of 5,000 incidents per month. The inclusion of AI into our system has both improved performance in terms of MTTR and increased the routing accuracy across all our input sources.<\/p>\n
Adopting new technologies and platforms can be a difficult sell within an organization\u2014especially if the technology hasn\u2019t previously been integrated into the company\u2019s systems.\u202fAI is one example of this; the notion of solving business problems with cognitive computing might be considered too expensive and exotic for some stakeholders. However, this is at the core of what digital transformation can deliver: changing old, siloed ways of thinking; automating operations and incorporating agile methodologies; moving systems to the cloud; and enhancing the user experience. The question for technical decision makers shouldn\u2019t be whether to embrace the new; instead, they should ask:\u00a0where do I start?<\/i><\/p>\n
At Microsoft, we\u2019re very excited about the power of AI and the value that it can bring to our organization. And we didn\u2019t have to look far to find a readily available machine learning engine: Azure offers a wide range of capabilities that include AI. Working with Azure Machine Learning Studio enabled us to build our solution without middleware or other third-party licensing, hardware, training, or support. Because we\u2019re already running all our SAP processes in Azure, we\u2019re extracting even more value out of our cloud investment by incorporating Azure\u2019s built-in AI capabilities into the existing system.<\/p>\n
By deploying Azure Machine Learning within our SAP incident management system, we\u2019ve significantly reduced the mean time to resolve (MTTR) SAP user-reported issues. Our automated AI solution also allows us to reallocate the human support resources who used to perform this task to more strategic tasks for the business.<\/p>\n
From the beginning, our intent has been to build an end-to-end solution that can be applied to many business scenarios surrounding our SAP implementation. This first Azure Machine Learning project has proved a great success, and as such, is laying the groundwork for us to leverage the solution in other applications. We plan to continue our work with Azure Machine Learning and explore incorporating BOT technology to streamline our entire suite of incident support systems, which has the potential to improve the performance and user experience in more than 1,500 apps.<\/p>\n
Ultimately, we want to expand our use of Azure Machine Learning and AI beyond support queue automation and explore how we might use it and other Azure products and services to solve new challenges. With our initial implementation completed, we now have a workflow in place to create additional machine learning algorithms and then deploy them to production where they can benefit other business-critical processes.<\/p>\n