Monitoring SAP end to end on Azure

Sep 4, 2020   |  

Female developer coding her workspace in an enterprise office, using a multi-monitor set up.

Our monitoring platform for SAP uses Microsoft Azure monitoring tools to provide end-to-end business process and platform health management for one of the largest SAP instances in the world. This platform helps us keep SAP operating consistently and efficiently. It also provides leadership a comprehensive assessment of enterprise-level business-process health.

Microsoft Digital created a monitoring platform that uses Microsoft Azure monitoring tools to provide end-to-end business process and platform monitoring for one of the largest SAP instances in the world. Our monitoring platform helps our management teams keep SAP operating consistently and efficiently. The platform also helps key business users examine process flows and provides leadership a comprehensive assessment of enterprise-level business-process health.

Examining SAP at Microsoft

Like many enterprises, Microsoft uses SAP—the global enterprise resource planning (ERP) software solution—to run various business operations. Our SAP environment is critical to our business performance, and it’s directly connected to most of our business processes. SAP offers functionality for enterprise services at Microsoft, such as human resources, finance, supply-chain management, and commerce.

SAP provides an agile infrastructure, minimizing downtime, risks, and costs, and it improves employee efficiencies to power our digital transformation. Our enterprise SAP environment exists on a large scale. The size and scope statistics for ERP, our SAP production system, include:

  • 17 terabytes (TB) of highly compressed database storage (50 TB uncompressed)
  • 65 Azure virtual machines (VMs)
  • 110,000 internal users, and 8,000 named accounts
  • 100 percent growth in the past two years
  • 9 million dialog steps per day
  • 300 million transactions per month
  • 300,000 monitored batch jobs per month
  • 0.4 seconds average user-response time
  • 99.998 percent infrastructure availability

We host 100 percent of our SAP environment in Azure. Running SAP on Azure allows us to leverage the breadth of Azure functionality and integrated SAP features as our business grows and changes. Additionally, Azure helps us combat infrastructure underutilization and overprovisioning, and allows us to quickly and easily scale up and scale down our SAP systems to meet our immediate business needs.

SAP infrastructure and operations monitoring is a massive undertaking. Relevant real-time and historical data from SAP ensures that our business runs efficiently. Our business leaders need to know the health of business processes that SAP drives and supports. Our engineering and service teams need to understand how our SAP infrastructure and application layers operate. Our partners and external teams need to understand the health of the interface between SAP and the external systems that they use. In all these scenarios, stakeholders need a manageable, accessible solution that provides the monitoring results that they require.

Monitoring SAP on Azure

Monitoring our SAP infrastructure begins with understanding how it operates. We approach SAP management and implementation at Microsoft based on four distinct layers. These layers define the functional, operational, and management separation that exists within our SAP environment:

  • Business-process layer. This layer defines the individual processes that SAP supports. Examples of these processes include sales orders, invoices, deliveries, and import/export functions. Each business process owner needs to maintain oversight for business-process functionality, efficiency, and health.
  • SAP application layer. This layer contains the individual SAP components that support business processes. These components include the SAP kernel, batch job processes, and queues. Our SAP management team needs to closely monitor application-layer components to understand how they affect the business processes that they support and how they’re affected by the underlying infrastructure.
  • SAP infrastructure layer. This layer includes the underlying virtual machines and other technical components that support the application layer. Our SAP management team needs to understand the technical operation of this layer to ensure that our systems are operating efficiently and without issues.
  • Web services and application programming interface (API) layer. This layer integrates with the application and infrastructure layers to connect SAP to upstream and downstream systems. The API layer creates a connection between SAP and the outside world that enables our entire organization to capture important data from SAP or feed external data into the SAP ecosystem. It also helps us integrate our business processes with other applications outside our SAP systems and facilitates data flow between environments and applications.

Figure 1 depicts the SAP monitoring multilayer telemetry structure.

Graphic representing the multilayer telemetry of SAP monitoring. Three layers are represented horizontally: business process, application, and infrastructure. Four main components are distributed across these layers: upstream apps, APIs and webservices, SAP, and downstream apps.
Figure 1. The multilayer telemetry approach to SAP monitoring

Solution architecture

Our solution architecture relies on Azure Monitor and supporting cloud technologies to fully instrument the infrastructure and application foundation layers of our SAP environment on Azure. At the most granular level, we monitor each of the more than 900 Azure virtual machines that run within our SAP environment. We use both Windows VMs running Microsoft SQL Server and Linux VMs running HANA. Our monitoring process includes the following:

  1. Data capture. For both platforms, the data-capture process is the same: The Azure Monitor Log Analytics agent installed on the VM captures the event and metric information from the EventLog in Windows or the Syslog in Linux. A large amount of SAP telemetry data is written to these logs by default, including both SAP system log information, such as short dumps or failed updates and infrastructure data, such as CPU usage, network usage, memory usage. For certain aspects of the environment that don’t write to the EventLog or Syslog, we create local jobs by using PowerShell Core or shell scripts that gather data from the appropriate location in the VM and write that information to the EventLog or Syslog.
  2. Data ingestion. Log Analytics is the repository for SAP monitoring telemetry. Log Analytics provides a scalable, easy-to-manage solution for storing and retrieving event and metrics data for any downstream alert or reporting functionality. We’ve established a common schema within Log Analytics that enables us to cross-correlate data across multiple SAP platforms, data sources, and usage scenarios. The Log Analytics agent writes event information to Log Analytics in a JavaScript Object Notification (JSON) format that is parsed and then broken down into data that adheres to the common schema. This process and schema are critical to ensuring end-to-end visibility and reporting across all aspects of SAP functionality.
  3. Alerting. We use Azure Monitor to configure and manage alerts throughout the monitoring environment. We maintain alerts across all four layers of SAP functionality, ensuring that the system identifies business process, application, infrastructure, and API issues quickly and notifies the owner(s) at the appropriate layer. We can easily aggregate and correlate alerts to identify dependencies and issues that could impact end-to-end functionality. Alerts can also trigger ticket creation in our enterprise service management tool to enable efficient issue resolution and documentation.
  4. Reporting and alerting. We maintain two primary reporting paths for SAP monitoring. Both solutions use Microsoft Power BI, which provides a cloud-based platform that our service and business process owners can use for simple and effective reporting tasks.
    • Historical and trend reporting. Power BI reports directly against our Log Analytics repository for trend reporting. We maintain six months of historical data that enables weekly and monthly trend analysis. Reports and dashboards exist at each layer of the SAP environment, ensuring that each stakeholder has access to the information that they need to understand the health of their business process, infrastructure, application, or API component. We also generate roll-up reports and dashboards that provide oversight for business managers and executive-level stakeholders.
    • Real-time dashboards. Service owners and management teams need to understand the current state of the SAP environment. We provide dashboards for each layer of functionality and ownership. Dashboard users can drill down into specific dashboard areas or metrics for increased granularity or roll up to high-level overviews that provide enterprise SAP health data at a glance. Our real-time data is pulled from Log Analytics into Azure SQL Databases by using Azure Automation runbooks on a minute-by-minute basis. We use Power BI DirectQuery against the Azure SQL Databases to create easily customizable Power BI dashboards.

Figure 2 depicts the SAP monitoring architecture.

An illustration of the SAP monitoring architecture. SAP sends data to Windows EventLog and Linux Syslog which are both imported into Azure Log Analytics. Azure Alerts, Azure Runbook, and Power BI dashboards take information from Log Analytics and send the data downstream.
Figure 2. SAP monitoring architecture

Considerations and best practices

Our application of monitoring for SAP on Azure is a continually evolving process. We’ve identified several considerations and best practices that have helped us optimize the SAP monitoring environment, including:

  • Adopting a layered approach provides end-to-end high-level visibility and granular exposure of specific components. By building our monitoring solution around the four layers of SAP functionality, we were able to create a logical, business-focused approach to monitoring that always provides relevant information, regardless of the layer. Our engineers and support teams can gain the detailed, technical perspective that they need, while our business and executive stakeholders access dashboards that use practical business-oriented terminology and data.
  • Azure Monitor provides a single, scalable platform for all monitoring needs. Azure Monitor provided the core functionality for our SAP monitoring capability. It scales to our monitoring needs, integrates directly and intuitively with Azure components, and provides a single management and development environment for all our monitoring needs.
  • Monitoring the monitoring solution itself is critical. We put significant effort into ensuring that our SAP monitoring solution is running effectively and efficiently. As our SAP environment grows and changes, we track alert thresholds and critical data streams to provide the most accurate and relevant reporting and alerting to our end users.
  • Monitoring data can be used to measure and ensure desired state configuration. The depth of monitoring data that we collect allows us to examine aspects of environment configuration such as SQL or HANA database parameters, VM configuration, and application settings. By tracking this information, we can measure the configuration state across our environment to ensure that all components remain in a state that supports efficient SAP system operation.
  • Using a single platform provides universal benefit. By using a single platform to collect and monitor SAP data, we’ve created a comprehensive overview of our SAP environment from end to end. We can understand relationships and dependencies between SAP components and business processes, and all metrics and data come from a single source.

Looking forward

As our SAP environment grows and changes, we’re examining new ways to improve our monitoring capabilities and create increased monitoring benefits across the SAP landscape. We’re incorporating predictive analytics into many of our monitoring streams to identify trends and perform predictive root-cause analysis for issues before they become a problem for our users. For example, identifying a sequence of events that frequently result in system downtime will allow us to recognize the event chain early and mitigate the issue quickly. We’re also investigating automation options for on-demand resource scaling, self-healing, and configuration-drift prevention.

Conclusion

We’re continually refining and improving our SAP monitoring solution on Azure. The single-platform solution built on Azure Monitor enables us keep key business users informed of business-process flow, provides a complete assessment of business-process health to our leadership, and helps our engineering teams create a more robust and efficient SAP environment. Telemetry and business-driven monitoring have transformed the visibility into our SAP environment on Azure, and our continuing journey toward deeper business insights, intelligence, and automation is making our entire business better.