{"id":67237,"date":"2023-05-03T14:38:53","date_gmt":"2023-05-03T13:38:53","guid":{"rendered":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/?p=67237"},"modified":"2023-06-27T17:01:28","modified_gmt":"2023-06-27T16:01:28","slug":"a-practical-approach-to-monitoring-your-cloud-workloads","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2023\/05\/03\/a-practical-approach-to-monitoring-your-cloud-workloads\/","title":{"rendered":"A Practical Approach to Monitoring Your Cloud Workloads"},"content":{"rendered":"\n
Being a Cloud Solution Architect is great. We become trusted advisors to many customers across lots of different industries, helping them to be successful and get the best out of Microsoft Azure. A customer will take advantage of the many great Azure resources available to them, assembling these resources in the cloud to implement their particular workload, thoroughly test it, see it all working beautifully and finally prepare to move it into production to start delivering those business value objectives. All is well with the world!<\/p>\n\n\n\n
As ‘go-live’ day approaches for your shiny new workload, the focus moves from ‘architectural excellence’ to ‘operational excellence’. Typically at this point lots of questions arise from the operational teams:<\/p>\n\n\n\n
The good news is that Microsoft has a lot of documentation and guidance to help, such as the Cloud Adoption Framework<\/a>, the Well-Architected Framework<\/a> and the Azure Architecture Center<\/a>. These can help you get started with your cloud adoption goals, together with a wealth of information on the many Azure monitoring tools.<\/p>\n\n\n\n This blog is intended to build upon all of this by proposing a prescriptive and practical approach, that anybody could implement with their teams, to answer these questions and provide an approach to get you on the road to implementing a solid monitoring solution tailored to your particular Azure workloads.<\/p>\n\n\n\n We start today with an overview of the process and we will, in the near future, be releasing some specific example scenarios to help bring this to life, addressing some key areas, such as monitoring for networks, monitoring for applications and monitoring for SAP etc.<\/p>\n\n\n\n Finally, remember that this is a continuous journey. Whilst this blog will provide an approach which enables the implementation of a monitoring Minimal Viable Product (MVP), it should be followed by continuous review and refinement of your solution as it evolves and new requirements are identified.<\/p>\n\n\n\n Any workload that is deployed in the cloud is going to have a lot of component parts that combine to form the overall solution. It\u2019s not any different to, say, a car that has wheels, a gearbox, an engine, a transmission and doors. All of those parts combine together for the overall solution of being a mode of transport that gets you to work and back home. A cloud solution will have networking, maybe some virtual machines, some storage and probably some platform services and applications that all need to be monitored so that you can understand how it\u2019s performing and just as importantly, when a failure occurs, understand exactly where the fault lies.<\/p>\n\n\n\n It is vitally important to ensure quick identification and resolution of anomalies and ensure performance and availability of deployed solutions are maintained within your Service Level Agreements. Azure provides the cloud-based tools to allow you to monitor across all levels of your software stack plus the underlying compute, storage and networking components provided by Azure itself.<\/p>\n\n\n\n With such a wide range of monitoring points, the inevitable issue that all businesses face, is understanding what monitoring services need to be combined to deliver that end-to-end visibility.<\/p>\n\n\n\n You should understand that your monitoring strategy will evolve over time and be careful not to delay by ensuring you have every base covered. Your first objective is to ensure “Observability.” You need to capture some key information about your resources which will allow you to both monitor your environment but also learn for future evolution.<\/p>\n\n\n\n Below are six steps<\/strong> that should be covered to build that baseline of observability:<\/p>\n\n\n\n This is an important first step to baseline your workload and importantly identify all<\/strong> services involved in the solution from the underlying platform (networking, peerings, ingress\/egress appliances etc.) through resources (virtual machines, storage, databases, integration service, PaaS services etc.), up to the applications themselves. This is where we clearly define what<\/strong> we should be taking into consideration for an end-to-end monitoring solution. So the output here will likely be an architecture drawing and a spreadsheet listing all of all the identified services.<\/p>\n\n\n\n Azure services already have a wealth of metrics, logs and insights available to use. So the proposal here is, that for each service identified in the previous stage, the already available monitoring options for each should be identified and listed. This gives a great starting point to the question “what should we be monitoring for our Azure workload?<\/strong>” question. The output here will be a list of metrics, logs and monitoring services against each resource.<\/p>\n\n\n\n The previous step should provide food for thought when it comes to deciding what you may want to monitor and some of the things Microsoft would recommend you look at. However, it is likely you have your own ideas and requirements for what you want to monitor and some of these may be covered by the monitoring sources identified in step 2 and others may not. So this is really a very important stage, assembling your monitoring requirements in a clear unambiguous fashion.<\/p>\n\n\n\n You should be able to categorise monitoring requirements. For example, wanting to receive an alert email for a metric threshold breach is not the same as wanting a dashboard showing the variation in that metric over the last 90 days. So you could classify the former as an \u2018alert\u2019<\/strong> category whilst the latter is a \u2018performance\u2019<\/strong> category etc.<\/p>\n\n\n\n As a starting point, you should consider making a list of these ‘User Stories’<\/strong>. A User Story is an end state that describes something as told from the perspective of the person desiring the functionality. It is widely used in software development as a small unit of work. This approach ensures that you capture the “who<\/strong>” as well as the “what<\/strong>” and “why<\/strong>” for the monitoring requirement. You can then categorise your stories into different sections together with a success criteria referred to as ‘Definition of Done’<\/strong> (DoD). This approach works very well for monitoring requirements. Here are some category examples:<\/p>\n\n\n\n With this approach you can write a monitoring requirement like this example:<\/p>\n\n\n\n<\/a>Where do I start?<\/h3>\n\n\n\n
<\/a>A six step approach<\/h3>\n\n\n\n
\n
\n
\n
\n
\n
\n
\n