As part of our continuing digital transformation journey, our Microsoft Digital Employee Experience (MDEE) team is constantly looking for ways to improve our business processes and detect issues and anomalies before they become serious problems.
Sometimes these failures happen sporadically within the application framework and often go undetected. Even if a user detects an anomaly, they need to decide how to react, which is time consuming.
To do this, we’re using Microsoft Azure Anomaly Detector to examine transactions across our SAP environment, which helps us identify issues before they become problems. In turn this enables us to proactively improve the performance, consistency, and reliability of our entire SAP landscape.
[Unpack how we’re optimizing SAP for Microsoft Azure. | Discover how we’re protecting Microsoft’s SAP workload with Microsoft Sentinel. | Explore how we’re upgrading Microsoft’s core Human Resources system with SAP SuccessFactors.]
Understanding the need for anomaly detection in SAP
At Microsoft, our SAP environment comprises many complex processes across multiple lines of business. To avoid having disparate environments and isolated monitoring and reporting data, we wanted to build a single codebase solution for monitoring and anomaly detection that each line-of-business can use with minimal code implementation.
We wanted to build intelligence to detect anomalies and inconsistencies in business process flow to improve platform health. Improved platform health improves engineering service-level agreements (SLAs) and reduces revenue loss by being proactive rather than reactive.
There were hundreds of areas that could benefit from anomaly detection in our SAP portfolio, but we wanted to identify a single area for our pilot project. In the Master Data Management (MDM) space, we create thousands of objects representing business entities such as customers and business partners.
Most of these objects are created by using an application programming interface (API), and no human interaction is needed. However, it’s extremely difficult to identify if issues related to MDM are occurring in upstream systems, so we needed a way to capture issues in advance, proactively and quickly.
In the MDM space, we have SAP Master Data Governance (MDG) background processes, such Customer Master data creation, which run without any user interaction. Across various batch and scheduled jobs, process runtime varies based on data volume, time of day, time of year, and resource availability.
Understanding the potential for issues in each process and the larger process environment involves several challenging questions, including:
- Is the transaction supposed to run that long?
- Is there a problem in an upstream system?
- Are there resources that reached their maximum capacity or that are creating a performance bottleneck?
Assessing Microsoft Azure Cognitive Services and Microsoft Azure Anomaly Detector
Detecting these issues by using human triage was difficult and time and resource intensive. Many issues went undetected, resulting in poor customer experience and the loss of potential revenue, in addition to lost capacity that could have been used for more productive purposes.
To solve this problem, we required a solution that was reliable, scalable, and easy to integrate with our SAP systems. The solution that we wanted would be process agnostic, implemented as a single codebase, and require no human intervention to detect issues.
The Microsoft Azure Anomaly Detector service, available within Microsoft Azure Cognitive Services, fits all our requirements.
The Anomaly Detector API enabled us to monitor and detect abnormalities in our data without having to know machine learning. The Anomaly Detector API’s algorithms adapt by automatically identifying and applying the best-fitting models to data, regardless of industry, scenario, or data volume, which greatly reduced our development efforts. Our primary steps were quite simple:
- Provision a service instance for Anomaly Detector in Microsoft Azure Cognitive Services.
- Start using the REST APIs in application code and interactions.
Using time-series data and data anomalies
For Anomaly Detector to identify anomalies, it requires time-series data, which is a series of data points indexed in time-based order.
For example, your car might have embedded sensors that send information regarding engine health, speed, tire pressure, and gasoline capacity. This information about your car is constantly updated over time and, as such, it can be used as time‑series data.
Most data received throughout time can be manipulated to be time‑series data if it’s a consistent data sequence with a time stamp. Time-series data with a single variable is considered a univariate series, while time-series data with more than one variable is considered a multivariate time series. Anomaly detector supports both univariate and multivariate series.
A data anomaly is outlying data that doesn’t fit within expected boundaries. The graphic below depicts the visual pattern of the time-series data with highlighted anomaly points in the time-series data. The graphic contains each of the time‑series data on the plot.
Data should be within minimum and maximum boundaries. In the figure, the boundary is filled with a light color. Most of the data points are within the expected boundaries. However, some data points that exceed the expected boundaries are highlighted in red in the figure, are data anomalies.
For example, a stock price that drops below the expected limit is a data anomaly. If the temperature reading of a power plant core exceeds the acceptable limit, the reading is a data anomaly, and the technicians at the power plant should be immediately notified so that they can act based on the anomaly.
Not all data anomalies are negative.
For example, if you have an article on your website that’s trending and experiencing larger traffic volume than normal, you likely want to be notified about the anomaly.
Or, if you have an e‑commerce website and receive a sudden spike in product demand, you, as the product supplier, should be notified so that you can act immediately. The graphic below contains examples of inputs and results for the Anomaly Detector service.
Using Microsoft Azure services to create a business solution
To enable integration with our SAP portfolio, we’ve implemented several decoupled software components. Each component has a specific use case, and we decouple business logic and the presentation layers to the extent possible. All application code is committed to a Microsoft Azure DevOps repository and is built as a Microsoft Azure-native solution.
- Microsoft Azure Web Apps. We host the front-end (presentation layer) application in an Azure Web App, from which the user can call the anomaly-detection service by using the prepared time-series data. Microsoft Azure Web App Service gives our developers the option to work in their preferred language, which can be .NET, .NET Core, Java, Ruby, Node.js, PHP, or Python. We protect the application endpoint with Microsoft Azure Active Directory for user authentication and authorization.
- Microsoft Azure Function Apps. We host all business-logic functionality in Azure Function Apps. We use two Azure Function Apps. The first is used to connect to Microsoft Azure Application Insights and capture SAP telemetry, such as customer or business-partner processes that need anomaly detection.
The Function App transforms the data into JavaScript Object Notation (JSON) format with time-series subformatting. The second Function App captures the precompiled time-series data from the first Function App, makes a call to the Anomaly Detector service, and then retrieves the result. The Web App presentation layer displays the results in a graph format. Function App endpoints are protected with access tokens.
- Application Insights. We store all SAP log data in Application Insights. This log data is posted from various business processes, including Customer Master Data creation, Business Partner Creation and updates, and batch program logs. These logs are the source for all anomaly detection.
- Microsoft Azure Anomaly Detector. Anomaly Detector uses the Anomaly Detector API to detect and return all anomaly points based on time-series data that the Function Apps send. While there are two options for interacting with Anomaly Detector, our developers chose to call the HTTP REST API directly for the Anomaly Detector rather than use the client SDK to integrate Anomaly Detector directly with their application. Using the API removes the limitation of using a single codebase and enables simple integration with any modern language that supports calling REST APIs through HTTP.
Implementation architecture
As depicted in the graphic below, various SAP applications post their business-process logs into the Application Insights instance. The Web App hosts the core application, including the presentation layer and user interaction. The two Function Apps perform extract and process data from the Application Insights service and control interaction with the Anomaly Detector service. The Function Apps send the final results from the Anomaly Detector service for display and consumption in the Web App.
Business implementation and benefits
One of our key business processes that we onboarded to the Anomaly Detector–based solution was the Master Data Management (MDM) business-partner creation that uses SAP Master Data Governance (MDG).
We constantly create and update business-partner data in our SAP system via API calls from various upstream tenants and front-end systems. Based on incoming telemetry sources, the Anomaly Detector solution detects if there is a sudden drop in creation or update processes because of API failure or network issues.
The detection algorithm can detect these issues automatically, in real time, which helps our system users to take corrective action. This simple addition to the issue-detection process helps us supply a better customer experience and eliminates major negative effects on revenue.
We’re planning to implement the same solution design across many other business processes, such as batch-job monitoring.
Currently, we have several hundred batch jobs that range from a runtime of a few seconds to several hours. It’s extremely difficult to monitor them manually and individually.
Sometimes, due to system issues or transaction locking, these jobs take more time, further affecting downline processes. Anomaly detection will play a critical role in detecting those issues, creating automatic alerts, and reducing manual monitoring.
This application has many potential use cases across multiple business scenarios. We’re planning to explore several of these use cases, including:
- SAP batch job monitoring, evaluating long running jobs and triggering alerts.
- Business-document processing and creation, such as sales orders, purchase orders, financial postings, and work orders.
- Any set of data that has time-series patterns. Data sets such as these can be evaluated and monitored for anomaly detection on a case-by-case basis.
Using Microsoft Azure Anomaly Detector has enabled us to quickly and efficiently build a solution to detect abnormalities in our SAP processes without having to know machine learning. The Anomaly Detector API’s algorithms help us to identify issues before they become problems, thereby proactively improving the performance, consistency, and reliability of our entire SAP landscape.