Pete Apple, Author at Inside Track Blog

Microsoft moves IT infrastructure management to the cloud with Microsoft Azure

Pete Apple — Tue, 03 Sep 2024 16:08:04 +0000

We’re transforming our IT infrastructure management internally here at Microsoft.

At Microsoft Digital Employee Experience (MDEE), we’re embracing our digital transformation and the culture changes that comes with it. With over 98 percent of our IT infrastructure in the cloud, we’re adopting Microsoft Azure monitoring, patching, backup, and security tools to create a customer-focused self-service management environment centered around Microsoft Azure DevOps and modern engineering principles. As we continue to benefit from the growing feature set of Azure management tools, we’ll deliver a fully automated, self-service management solution that gives us visibility over our entire IT environment.

The result?

Business groups at Microsoft will be able to adapt IT services to best fit their needs.

[Explore shining a light on how Microsoft manages Shadow IT. | Discover enabling a modern support experience at Microsoft. | Unpack creating the digital workplace at Microsoft.]
For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=C1PEhAT1Cns, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Microsoft experts share the processes and tools used to move our monitoring services into Azure. They discuss how we utilized solutions that use native Azure functionality to recreate certain SCOM functions and views in Azure Monitor. You will also learn how DevOps teams use log analytics to gain more visibility into end-to-end application performance.

Digital transformation at Microsoft

Our MDEE team is a global IT organization that strives to meet Microsoft business needs. Microsoft Azure is the default platform for our IT infrastructure. We host 98 percent of our IT infrastructure in the cloud. Here are a few details:

More than 220,000 employees
150 countries
587 locations
1,400 Azure subscriptions
1,600 Azure-based applications
17,000 Azure infrastructure-as-a-service (IaaS) virtual machines
643,000 managed devices

Like most IT organizations, we have our roots in the datacenter. In the past, our traditional hosting services were mostly physical, on-premises environments that consisted of servers, storage, and network devices. Most of the devices were owned and maintained for specific business functions. Technologies were very diverse and needed people with specialized skills to design, deploy, and run them. Our achievements were limited by the time required to plan and implement the infrastructure to support the business.

As technology evolved, we began to move out of the datacenter and into the cloud. Cloud-based infrastructure created new opportunities for us and has transformed the IT infrastructure we manage. We continue to grow and adapt in a constantly changing IT landscape.

Traditional IT technologies, processes, and teams

Our traditional datacenters were managed by a legion of IT pros, who supported the diverse platforms and systems that made up our infrastructure. Physical servers, and later virtual servers, numbered in the tens of thousands, spanning multiple datacenters and comprising a mass of metal and silicon to be managed and maintained. Platform technologies ranged from Windows, SQL Server, BizTalk, and SharePoint farms to third-party solutions such as SAP and other information security-related tool sets. Server virtualization evolved from Hyper-V to System Center Virtual Machine Manager and System Center Orchestrator.

To provide a stable infrastructure, we used structured frameworks, such as IT Infrastructure Library/Managed Object Format (ITIL/MOF). Policies, processes, and procedures in the framework helped to enforce and control security and availability, and to prevent failures. Microsoft product engineering groups that used hosting services had a similar adoption process for their application and service needs, which were based on ITIL/MOF.

This model worked well for traditional IT infrastructure, but things began to change when cloud computing and Microsoft Azure began to influence the IT landscape.

Evolution of the hybrid cloud

As IT infrastructure and services began to move to the cloud, the nature of the cloud and how we treat it changed. We’ve now been hosting IT services in Microsoft Azure for a long time, and as Azure has evolved and grown, so has our engagement with Azure services and the volume of our IT services hosted in Azure.

Early Azure: IT-owned, IaaS, and lift-and-shift

In the early years, Microsoft Azure was IT only. We had full control of cloud development, implementation, and management. We could create and manage solutions in Azure, but it was a siloed service.

The infrastructure consisted primarily of IaaS virtual machines that hosted workloads in the cloud the same way that they hosted workloads in on-premises datacenters.

Efficiency gains were small and infrastructure management still used the same tools—sometimes hosted in the cloud and sometimes hosted on-premises and connected to the cloud. It was very much a lift-and-shift migration from the datacenter to the cloud, and our management processes imitated the on‑premises model in much the same way.

The datacenter remained the focus, but that was changing.

Microsoft Azure evolves: PaaS, co-ownership, and cloud-first

As Microsoft Azure matured and more of our infrastructure and services moved to the cloud, we began to move away from IT‑owned applications and services. The strengths of the Azure self-service and management features meant that a business group could handle many of the duties that we offered as an IT service provider—which meant that they could build solutions that were more agile and responsive to their needs.

Microsoft Azure platform-as-a-service (PaaS) functionality matured, and the focus moved from IaaS-based solutions to PaaS-based solutions. Azure became the default target for IT solutions; datacenter decommissioning began as more solutions moved to or were created in Azure. Monitoring and management was becoming cloud-focused as we pointed more of our System Center Operations Manager (SCOM) and System Center Configuration Manager (SCCM) instances at the cloud. Azure-native management started to mature.

Large-scale Azure: Service line–owned, IT-managed, PaaS-first

PaaS quickly became a focus for developers in our business groups, as they realized the agility and scalability they could achieve with PaaS-based solutions. Those developers shifted to PaaS for applications as we transitioned away from IaaS and virtual machine-based solutions.

With the advent of Microsoft Azure Resource Manager, which permitted a broader level of user control over Microsoft Azure services, we saw service lines begin to take ownership of their solutions, and business groups started to manage their own Azure resources.

The datacenter became an inconvenient necessity for apps that couldn’t move to Microsoft Azure. We still used SCOM and SCCM as the primary monitoring and management tools, but we had moved almost all our instances into IaaS implementations in Azure. Azure-native management became a mature product, and we started to plan and deliver a completely cloud-based management environment.

Microsoft Azure in a DevOps culture: Service line–managed, Internet-first, business-first

We’re continually nurturing a DevOps culture—DevOps has transformed the way that Microsoft Azure solutions are developed and operated. Our Azure solutions offer an end-to-end view for our business groups. They’re agile, dynamic, and data-intensive. Continuous integration and continuous development create a continual state of improvements and feature releases.

The Microsoft Azure solutions that our business groups use are designed to respond to their business needs. We actively seek and use Azure-native tools for control over and insight into IT environments, in Azure first, but also, back to the datacenter where required. We’re a long, long way past managing a stack of metal. The modern workplace is here at Microsoft, and it changes every day.

Realizing digital transformation

In the modern workplace, the developers and IT decision makers in our business groups have an increasingly critical business role. Our business groups need the autonomy to make IT decisions that serve their business needs in the best way possible. With 98 percent of our IT infrastructure in Microsoft Azure, we’re increasingly looking to the agility, scale, and manageability that Azure provides. Using this scale, we solve business needs and provide the framework for a complete IT organization, from infrastructure to development to management.

Managing the modern hybrid cloud

Our modern hybrid cloud is 98 percent Microsoft Azure—and Azure is the primary platform for infrastructure and management tools. Azure is not only the default platform for IT solutions—it is our IT solution.

Just as PC sprawl occurred in the late 1990s and server sprawl did the same thing in the 2000s, cloud sprawl is a growing reality. Implementing new cloud solutions to manage the cloud environment and the remaining on-premises infrastructure is critical for our organization. The new Cloud solutions scope includes the flexibility for our engineers to leverage PaaS, Functions, and Container models to optimize the management of Cloud Environments.

Embracing decentralized IT

Decentralized IT services are a big part of digital transformation. We need a management solution that offers us—and our business groups—what we need to manage our IT environments. We always want to maintain governance over security and compliance of Microsoft as a whole, but we also realize that decentralized IT services are the most suitable model for a cloud-first organization.

By decentralizing services and ownership in Microsoft Azure, we offer our business groups several benefits:

Greater DevOps flexibility.
A native cloud experience: subscription owners can use features as soon as they’re available.
Freedom to choose from marketplace solutions.
Minimal subscription limit issues.
Greater control over groups and permissions.
Greater control over Microsoft Azure provisioning and subscriptions.
Business group ownership of billing and capacity management.

Our goal in the management of modern hybrid cloud continues to be a solution that transforms IT tasks into self-service native cloud solutions for monitoring, management, backup, and security across our entire environment. With this solution, our business groups and service lines have reliable, standardized management tools, and we can maintain control over and visibility into security and compliance for our entire organization.

The areas where we retain oversight include:

General IT and operational policy implementation, as approved by the subscription owner. Areas include compliance, operations, and incident management.
Shared network connectivity over Microsoft Azure ExpressRoute, as needed.
Visibility into infrastructure inefficiencies and self-service tool development.

Our management solution must be as agile as the solutions we manage, and we provide best practices, standards, and consulting for Microsoft Azure management solutions to ensure that our business groups are getting the most out of the platform.

Supporting digital transformation with Microsoft Azure management tools

Managing the hybrid cloud in Microsoft Azure encompasses a wide range of services and activities. For our business groups to improve, they need to monitor their apps and solutions to recognize issues and opportunities. They need a patching and management solution that keeps systems up to date, manages configuration, and automates common maintenance tasks.

We must protect data with a disaster recovery platform and ensure security and compliance for business groups and the entire company. We use the management tools in Microsoft Azure to enable hybrid cloud management.

Monitoring the hybrid cloud

Monitoring is an essential task for our business groups and their service lines. They need to understand how their apps are performing (or not performing) and have insight into their environment. We’ve used SCOM for monitoring at Microsoft for more than 10 years—and a certain rhythm develops when you use a product for that long.

To ease the transition from SCOM to Microsoft Azure monitoring, we developed transition solutions that use native Azure functionality to recreate certain SCOM functions and views in Microsoft Azure Monitor.

The transition solutions consist primarily of PowerShell scripts and documentation. They give our business groups a familiar environment to work in while they become familiar with Microsoft Azure monitoring.

Our business groups can also start in a standardized environment with our built-in tested security and compliance components. This helps us maintain a centralized standard while allowing for decentralized monitoring. We maintain metrics for critical organizational services, but we leave operational monitoring to each business group.

Our Microsoft Azure monitoring is designed to:

Create visibility. We’re providing instant access to a foundation set of metrics, alerts, and notifications across core Azure services for all business units. Microsoft Azure Monitoring also covers production and non-production environments, as well as native monitoring support across Microsoft Azure DevOps.
Provide insight. Business groups and service lines can view rich analytics and diagnostics across applications, as well as compute, storage, and network resources, including anomaly detection and proactive alerting.
Enable optimization. Monitoring results help our business groups and service lines understand how users are engaging with their applications, identify sticking points, develop cohorts, and optimize the business impact of their solutions.
Deliver Extensibility. Designed for extensibility to enable support for custom event ingestion, and broader analytics scenarios.

We’ve now retired our SCOM environment, leaving Microsoft Azure monitoring as the default for both cloud and on-premises monitoring now focusing on:

Automated installation and repair of the Microsoft Monitoring Agent using Microsoft Azure Runbooks.
Centralized visibility into comprehensive health and performance.
Fully featured transition solution development to enable complete self-service monitoring in Microsoft Azure.
Complete transition from SCOM to Microsoft Azure.

Patching, updating, and inventory management

As we’ve done for monitoring, we’re using transition solutions to make it easier for business groups to transition from previously used on-premises tools to Microsoft Azure.

Our patching processes depended on our preexisting solutions as we worked through the transition to Microsoft Azure. SCCM and associated agents provided the bulk of our patching, software distribution, and management process, but we’ve moved to Azure in a phased approach as our Azure subscriptions become ready to transition to Azure for management.

We’ve built transition solutions for our business groups to help them transition from the SCCM platform and other legacy tools to the Microsoft Azure update management patching service. We’re maintaining and modifying these transition solutions as Azure features replaced the on-premises functionality.

From a patching and management perspective, we’re focusing on:

The transition of inventory management from Configuration Manager to Microsoft Azure, including discovery, tracking, and management of IT assets.
Transition of our update processes to Microsoft Azure Update Management for business groups.
Enabling self-service patch management. We’re developing an orchestrated deployment of operating system and application updates with Microsoft Azure, including centralized compliance reporting.
Creating and updating solutions to support the transition of the above areas, including Resource Manager templates, PowerShell scripts, documentation, and Microsoft Azure Desired State Configuration.

The design for patching and management, as with monitoring, is to provide an Microsoft Azure-based self-service solution for our business groups that gives them control over their patching and management environment while giving us the ability to centrally monitor for compliance and security purposes.

Ensuring recoverable data

With Microsoft Azure as the primary repository for business data, it’s extremely important to have an Azure backup solution with which our business groups and service lines can safeguard, retain, and recover their data.

Our data recovery solutions address the following major areas of concern:

Recover business data from attacks by malicious software or malicious activity.
Recover from accidental deletion or data corruption.
Secure critical business data.
Maintain compliance standards.
Provide historical data recovery requirements for legal purposes.

Our Microsoft Azure data footprint is immense. We currently host 1.5 petabytes of raw data in Azure and use almost nine petabytes of storage to back up that data.

We’re using Microsoft Azure Backup as a self-service solution. It gives business groups more control over how they perform their backups and gives them responsibility for backing up their business data—because each business group knows its data better than anyone else.

We’re also using Microsoft Azure Backup for virtual machine-level backup, and we’re backing up some on-premises data to Microsoft Azure using Microsoft Azure Recovery Services vaults. We’ve created a packaged solution for backup management in Azure that consists of scripts and documentation—our business groups can use it to migrate to Azure Backup quickly and efficiently.

As with other areas of enterprise management, we’re evaluating new features for Microsoft Azure Backup that will offer more backup capabilities to our business groups.

Embedding security and compliance

Decentralization gets the greatest scrutiny when it comes to security and compliance. We’re responsible for security and legal compliance for the organization, so our security controls are the most centralized of all the cloud management solutions we implement. However, centralization does not directly affect day-to-day solution management for our business groups and their service lines.

We leveraged a broad set of security and compliance practices and tools that are generally applied across all Microsoft Azure subscriptions. The following imperatives govern the general application of security and compliance measures:

Microsoft Azure Policy. Using Azure Policy, we establish guardrails in subscriptions that keep our service engineers within governance boundaries automatically. Policy can help control a myriad of settings by default, including limiting the network configurations to safe patterns, controlling the regions and types of Microsoft Azure resources available for use, and ensuring data is stored with encryption enabled.
Automation gives us a chance to keep pace with the constantly changing cloud environment. DevOps is heavily centered on end-to-end automation, and we need to complement DevOps automation with automated security. Automated security saves significant time and cost for apps that are frequently updated, and we can quickly and consistently configure and deploy security.
Empower engineering teams. In an environment where change is constant, we want to empower our engineering teams to make meaningful, consistent changes without waiting for a central security team to approve an app. Our engineers need the ability to integrate security into the DevOps workflow. They don’t have to take extra measures to be secure, nor do they need to wait for a central security team to approve an app.
Maintain continuous assurance. When development and deployment are continuous, everything that goes with them needs to follow suit—including security assurance. The old requirements for sign-offs or compliance checks create tension in the modern engineering environment. We want to define a security state and track drift from that state to maintain a consistent level of security assurance across the entire environment. This helps ensure that builds and deployments that are secure when they’re delivered stay secure from one release iteration to the next and beyond.
Set up operational hygiene. We need a clear view of our DevOps environment to ensure operational hygiene. In addition to understanding operational risks in the cloud, DevOps operational hygiene in the cloud requires a different perspective. We need to create the ability to see the security state across DevOps stages and establish capabilities to receive security alerts and reminders for important periodic activities.

At MDEE, our goal is a completely cloud-based, self-service management solution that gives our business groups concise control over their environments using Microsoft Azure tools and features. We’ll continue to offer updated Azure-based solutions, transitioning away from on-premises, System Center–based management.

As we continue to transition business groups to cloud-based monitoring, we’re growing our feature set and making our Microsoft Azure-based management even better. We envision a near future where our management systems will be completely cloud based, decentralized, and automated—and our organization continuing to build our business in Azure.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Microsoft moves IT infrastructure management to the cloud with Microsoft Azure appeared first on Inside Track Blog.

Microsoft uses a scream test to silence its unused servers

Pete Apple — Sat, 17 Aug 2024 08:00:59 +0000

Do you have unused servers on your hand? Don’t be alarmed if I scream about it—it’ll be for a good reason (and not just because it’s almost Halloween)!

Check out Pete Apple’s expedition to the cloud series

I talked previously about our efforts here in Microsoft Digital to inventory our internal-to-Microsoft on-premises environments to determine application relationships (mapping Microsoft’s expedition to the cloud with good cartography) as well as look at performance info for each system (the awesome ugly truth about decentralizing operations at Microsoft with a DevOps model).

With this info, it was time to begin making plans to move to the cloud. Looking at the data, our overall CPU usage for on-premises systems was far lower than we thought—averaging around six percent! We realized this was so low due to many underutilized systems. First things first, what to do with the systems that were “frozen,” or not being used, based upon the 0-2 percent CPU they were utilizing 24/7?

We created a plan to closely examine those assets towards the goal of moving as few as possible. We used our home-built change management database (CMDB) to check whether there was a recorded owner. In some cases, we were able to work with that owner and retire the system.

Before we turned even one server off, we had to be sure it wasn’t being used. (If a server is turned off and no one is there to see it, does it make a sound?)

Developing a scream test

Pete Apple, a cloud services engineer in Microsoft Digital, shares how Microsoft scares teams that have unused servers that need to be turned off. (Photo by Jim Adams | Inside Track)

But what if the owner information was wrong? Or what if that person had moved on? For those, we created a new process: the Scream Test. (Bwahahahahaaaa!)

What’s the Scream Test? Well, in our case it was a multistep process:

Display the message “Hey, is this your server, contact us?” on the sign-in splash page for two weeks.
Restart the server once each day for two weeks to see whether someone opens a ticket (in other words, screams).
Shut down the server for two weeks and see whether someone opens a ticket. (Again, whether they scream.)
Retire the server, retaining the storage for a period, just in case.

With this effort, we were able to retire far more unused servers—around 15 percent—than we had expected, without worrying about moving them to the cloud. Winning! We also were able to reclaim more resources on some of the Hyper-V hosts that were slated to continue running on-premises. And as a final benefit, we cleaned up our CMDB a bit!

In parallel, we initiated an effort to look at some of the systems that were infrequently used or used a very low level of CPU (less than 10 percent, or “Cold”). From that, we had two outcomes that proved critical for our successful migration to the cloud.

The first was to identify the systems in our on-premises environments that were oversized. People had purchased physical machines or sized virtual machines according to what they thought the load would be, and either that estimate was incorrect or the load diminished over time. We took this data and created a set of recommended Azure VM sizes for every on-premises system to use for migration. In other words, we downsized on the way to the cloud versus after the fact.

At the time, we did a bunch of this work by hand, manually because we were early adopters. Microsoft now has a number of great products available that help assist with this inventory and review of your on-premises environment that you should check out. To learn more, check out this article with documentation on Azure Migrate.

Another statistic that the data revealed was the number of systems that were used for only a few days or a week out of each month. Development machines, test/QA machines, and user acceptance testing machines reserved for final verification before moving code to production were used for only short periods. The machines were on continuously in the datacenter, mind you, but they were actually being used for only short periods each month.

For these, we investigated ways to have those systems running only when required by investing in two technologies: Azure Resource Manager Templates and Azure Automation. But this is a story for the next time. Until then, happy Halloween!

Read the rest of the series on Microsoft’s move to the cloud:

The post Microsoft uses a scream test to silence its unused servers appeared first on Inside Track Blog.

Mapping Microsoft’s expedition to the cloud with good cartography

Pete Apple — Thu, 09 Nov 2023 17:08:52 +0000

When you’re charged with mapping Microsoft’s expedition to the cloud, sometimes it’s best to go back to the basics—like using an old fashioned map to help you find your way.

Check out Pete Apple’s expedition to the cloud series

An explorer in his own space, famous British film director Peter Greenaway once noted the significance of maps and cartography. “A map tells you where you’ve been, where you are, and where you’re going,” he noted in fascination. “In a sense, it’s three tenses in one.”

As a pioneer myself (in digital transformation), I couldn’t agree more. In my last blog post, I shared with you the awesomely ugly truth about how we decentralized operations at Microsoft and the intricacies and nuances we experienced as we adopted the Microsoft Azure DevOps model.

In talking to many of our customers, I know some of you are just starting out on your own cloud computing journey. So, let’s go back in time to the very beginning and share what happened the exact moment when our leadership gave the orders to start on Microsoft’s expedition to the cloud.

Microsoft Digital, our IT organization, is split into horizontal services (compute, storage, network, security) and vertical Line of Business (LOB) teams that provide solutions to our internal end users (Finance, HR, etc.). As the horizontal, our job is to ensure that our application teams have the appropriate computing systems and that those assets are tracked for cost and inventory purposes.

For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=jMGmL0B-4YQ, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Pete Apple unpacks Microsoft’s journey to the cloud via Microsoft Azure and his advice for optimization and change management.

When we got the announcement from management to start moving assets to the cloud, we simply did not know where to begin. Our first thought was to grab some “low hanging fruit” by targeting servers going out of service. We took a hard look at our physical and virtual inventory and soon realized that we weren’t even sure what was there.

One of the very first lessons we learned was that you can’t understand what applications you need to move if you don’t know what applications you have.

—Pete Apple, cloud services engineer, Microsoft Digital

Pete Apple shares a lighthearted moment illustrating the importance mapping played in driving Microsoft’s expedition to the cloud. Apple is a cloud services engineer in Microsoft Digital. (Photo by Jim Adams | Inside Track)

My team took this opportunity to evaluate our inventory processes and assess how inaccurate our Configuration Management Database (CMDB) was—very! We found systems in datacenters that didn’t have any records. We found records in the CMDB for systems that no longer existed. Cleaning this up became someone’s part-time job (when it really could have been a full-time one).

One of the very first lessons we learned was that you can’t understand what applications you need to move if you don’t know what applications you have.

To move forward, we broke down the inventory effort by vertical organization and partnered a representative from each LOB to a designated person from our team. With the help of Microsoft Azure Service Map, we were able to scan each LOB application, identify what systems each LOB used and what other applications they relied upon to build a more robust dependency map.

This is an important step to take because, as you move applications into the cloud, systems that are next to each other in on-premises datacenters might end up in two different Microsoft Azure datacenters, creating an unexpected latency the team might not have accounted for in the design. Understanding this relationship ahead of time will help you factor which Microsoft Azure datacenter applications should go into and diminish the delay.

A good example of this is when we moved a financial database that many other applications depended upon. If we moved that critical application’s servers into the Microsoft Azure US West region, we wanted to ensure the dependent applications would end up there too, or otherwise, consider the possibility of change latency for calls to that data. Similarly, if the critical database had a disaster recovery setup to the US East region, it just made sense to map the dependent applications to that same region for disaster recovery.

With this approach, we were able to begin our “cloud cartography plans” and map our inventory in on-premises datacenters and plan their final destinations for migrating into Microsoft Azure. We now knew where they had been, where they were right now, and where they needed to go!

And then…during the cartography process we discovered an interesting fact. Maybe we didn’t need to move as much as we originally thought? More on that next time…

Read the rest of Microsoft’s move to the cloud series:

The post Mapping Microsoft’s expedition to the cloud with good cartography appeared first on Inside Track Blog.

Managing Microsoft Azure solutions on Microsoft’s expedition to the cloud

Pete Apple — Wed, 08 Nov 2023 16:16:11 +0000

A very popular cliché used in Silicon Valley, the notion of having to “ship it and fix it and ship it again,” was all too familiar to my team as we focused our efforts on moving, managing, and monitoring solutions in Microsoft’s expedition to the cloud.

Hello again and welcome back to our blog series on how our team helped Microsoft move most of Microsoft’s internal workloads to the cloud and Microsoft Azure. My team in Microsoft Digital, the organization that powers, protects, and transforms Microsoft, is the primary horizontal infrastructure group and we’re responsible for ensuring our internal customers have servers, storage, and databases, all the hard-crunchy bits of hosting, to run the critical applications that make the company operate internally.

It became clear we were going to have to hybridize our management solution if we were going to get Microsoft’s expedition to the cloud right.

– Pete Apple, cloud services engineer, Microsoft Digital

Check out Pete Apple’s expedition to the cloud series

In this blog post I want to share what it took for us to effectively migrate solutions from on-premises to the cloud while managing and monitoring them for day-to-day operations. Go here to read the first blog in our series: The learnings, pitfalls, and compromises of Microsoft’s expedition to the cloud.

When I was running the hosting environment on-premises, our physical and virtual machine (VM) footprint was spread across multiple geographic datacenters, in two primary security zones—“corporate” and “DMZ.” Corporate refers to our internally facing services that our own employees use day to day for their jobs, while the DMZ holds our partner facing services that interact with the outside world. You might have a similar environment.

We used Microsoft System Center Operations Manager (SCOM) for monitoring and Microsoft System Center Configuration Manager (SCCM) for patching (this set of tools has been combined into Microsoft Endpoint Configuration Manager). As we started to look at moving solutions over to Microsoft Azure, it became clear we were going to have to hybridize our management solution if we were going to get Microsoft’s expedition to the cloud right.

Microsoft Azure ExpressRoute allowed us to “lift and shift” many of our on-premises VMs to the cloud as-is, which allowed us to operate them unchanged without disrupting our users. As more and more hosts moved from on-premises into Microsoft Azure, we eventually did a lift and shift on the Microsoft System Center servers themselves, so they were also operating out of a Microsoft Azure datacenter. Fair warning—there’s a tipping point when you get over 50 percent into the cloud based on the size of your environment and how quickly you’re moving VMs into the cloud, so think about it ahead of time.

Along the way, we learned that, in many cases, a cloud transition coincides nicely with shifting your application team to a DevOps model of deployment and management. We realized this early, which allowed us to change our technology and site reliability engineering practices in unison. For the DMZ and other internet-facing solutions, there were other options. We made sure our VMs in our internet-facing environment were within Microsoft Azure Update Management, so they stayed up to date and monitored.

Driving Microsoft’s expedition to the cloud has taught Apple many lessons that he’s happy to share with customers. (Photo by Jim Adams | Inside Track)

For teams looking to move to a modern cloud solution like PaaS or SaaS, we encourage other options rather than trying to duplicate past solutions. If an application was being refactored into a cloud native service without an operating system (and thus a SCOM/SCCM agent), we used modern monitoring solutions like Microsoft Azure Application Insights and Microsoft Azure Monitoring.

When I look back at Microsoft’s expedition to the cloud, it’s clear that we built the plane while flying it.

The evolution of moving to the cloud

Today, we in Microsoft Digital—Microsoft’s IT division—still operate a small System Center Endpoint Confirmation Management environment in corporate, which some teams continue to use for on-premises resources. All our Microsoft Azure resources have shifted to Azure native management, like Azure Monitor and Azure Update Management.

We had to learn to be flexible about management solutions because there are more options than just the simple “OS patch/monitor” world that we lived with for years.

– Pete Apple, cloud services engineer, Microsoft Digital

Moving to the cloud can feel like you’re “building the plane while you fly it,” so it’s critical that you get your management and monitoring right before you get started, says Pete Apple, a cloud services engineer in Microsoft Digital. (Photo by Jim Adams | Inside Track)

One pivotal lesson we learned early on was to share best practices across both our team and the company—that way no one had to make the same mistake twice. This helped us make sure we used the most current monitor solutions and thinking each time we deployed a new application. For example, when one team started using Azure for management we were able to share out what they learned, including using its update management and log analytics features to improve their operations.

Additionally, once we became a hybrid operation, we had to learn to be flexible about management solutions because there are more options than just the simple “OS patch/monitor” world that we lived with for years. This transition also changed the way we handle traditional information technology infrastructure library (ITIL) change and incident management—a new set of challenges as we trekked further into the cloud, which I’ll go into next time.

The post Managing Microsoft Azure solutions on Microsoft’s expedition to the cloud appeared first on Inside Track Blog.

Automating Microsoft Azure incident and change management on Microsoft’s move to the cloud

Pete Apple — Wed, 08 Nov 2023 16:00:52 +0000

Microsoft’s move to the cloud at Microsoft has certainly been an adventure.

New technology has enabled us to transform many of our IT processes, and in some cases make them entirely disappear. It’s also compelled us to reevaluate our operational health and ability to stay on pace with evolving operational functions such as monitoring and patching, architectures, and change management.

As we’ve moved to the cloud, we have been focusing on aligning the company’s IT services with the needs of the business under an operational model formally known as Information Technology Infrastructure Library (ITIL).

Historically, we would create one- to two-year architectures and be fine! Now, we’re evaluating exciting new features at least on a quarterly basis. Our team has had to learn to be agile—both literally and metaphorically.

– Pete Apple, cloud services engineer, Microsoft Digital

Check out Pete Apple’s expedition to the cloud series

You may be surprised (and perhaps a bit relieved) to learn that, from the point of view of a services engineer, our design and management functions have probably evolved the least on Microsoft’s move to the cloud. There’s certainly new technology to understand and incorporate into our architectural designs, but the team doing that work has basically remained the same. It’s been a great opportunity to learn about Microsoft Azure and how it handles compute, storage, data, and networks.

One thing that has certainly kept us on our toes has been the ever-evolving architectural changes that happen in the cloud. The Microsoft Azure team releases new features at more frequent intervals versus the traditional releases of the past. Historically, we would create one- to two-year architectures and be fine! Now, we’re evaluating exciting new features at least on a quarterly basis. Our team has had to learn to be agile—both literally and metaphorically (referencing the Agile methodology).

Microsoft Azure enabled our operations to evolve and become more productive, with a faster service turnaround time. A good example is our change management discipline.

Over four years ago, we had many standard change requests from our internal customers. I was running the private cloud at the time, and you can imagine the number and variety of requests that came across my desk: “Create a virtual machine,” “Install SQL,” “Rebuild the operating system,” and so on. Each request was a change record in our system that was immediately assigned to a system engineer to do the work with a pressing service-level agreement (SLA) of 72 hours.

Sound familiar?

As we trekked further on Microsoft’s move to the cloud, we took a hard look at every change type in the internal catalog and automated everything that could be automated.

We reviewed the number and variety of change orders coming through and realized that with some scripting advances, System Center Orchestrator, Azure Templates, and Azure Automation, we could start automating many of these change activities. This enabled us to cut back on human error, improve the SLA, and in many cases implement a self-service approach for internal customers to deploy themselves instead of waiting on my team to implement the change manually.

Today, Microsoft Azure services are enabling Microsoft internal teams to self-service their own changes and skip the dreaded “open a ticket” model.

On the incident side, we also found similar ways to be more efficient.

Automating incident and change management through optimized architecture may sound a bit scary, but it’s been a real benefit to our organization.

– Pete Apple, cloud services engineer, Microsoft Digital

Pete Apple, cloud services engineer in Microsoft Digital, is driving Microsoft’s operational IT transformation with Microsoft Azure services. (Photo by Jim Adams | Inside Track)

As our Microsoft Azure migrations increased, we found that our customer application developers wanted to have direct access to their Azure subscriptions to do more rapid DevOps-type deployments. This meant in many cases that they were finding and discovering issues or incidents almost instantaneously. They didn’t need to have a central team fronting incident management as much as they used to.

In response, we transitioned our incident management into a hybrid model—where the application teams can choose to have Microsoft Azure Monitoring and Application Insights alerts sent directly to them, and infrastructure alerts and outages still get forwarded to our centralized team. This has increased the skills required for some of the application teams to handle service reliability activities themselves and improved time to resolution and bug fixes for those same teams. What we’ve maintained is our centralized “escalation management” function that can help manage a major incident (or in the new nomenclature, a “LiveSite”).

Automating incident and change management through optimized architecture may sound a bit scary, but it’s been a real benefit to our organization. Removing some of the overhead in change management has cut costs in some cases by 30 to 40 percent and increased the speed of results for customers. I used to have a 48- to 72-hour SLA for building out a customer virtual machine. Now customers can spin one up in Microsoft Azure themselves in under 30 minutes!

Enabling teams to choose to receive alerts and incidents directly into their Microsoft Azure DevOps teams and escalate to central IT only when required empowers them to resolve items that impact their business more rapidly.

Unleashing Microsoft Azure and incorporating cloud patterns into architecture designs can really save time and costs for change management efforts, while improving the SLA and customer experience. But what does it mean for subscriptions and service over time? Check back with us soon as we continue the “Operationalizing the cloud” blog series and share insights and learnings from Microsoft’s move to the cloud.

Learn how Microsoft Azure services help configure and automate operational tasks across a hybrid environment, use ARM template documentation for efficient management, and provide a framework to manage the next generation of business apps and infrastructure.

The post Automating Microsoft Azure incident and change management on Microsoft’s move to the cloud appeared first on Inside Track Blog.

The learnings, pitfalls, and compromises of Microsoft’s expedition to the cloud

Pete Apple — Thu, 02 Nov 2023 16:00:42 +0000

Our expedition to the cloud started some time ago, well before moving to the cloud was a twinkle in Microsoft’s eye.

Yes, only a handful of us at Microsoft have been around long enough to hold coveted treasures like a 5¼ Windows NT release boot disk or a full set of long, shiny hair displayed on a 90’s Microsoft badge. But as exciting as those early computing times and hairstyles were, I’m finding what’s more invigorating is the remarkable odyssey my team and I have recently embarked on as we move from on-premises datacenters into the cloud.

Welcome to our blog series “Operationalizing the Cloud.” My team in Microsoft Digital, the organization that powers, protects, and transforms Microsoft, is the primary horizontal infrastructure group and we’re responsible for ensuring our internal customers have servers, storage, and databases, all the hard-crunchy bits of hosting, to run the critical applications that make Microsoft operate internally.

Check out Pete Apple’s expedition to the cloud series

Who are you?

You’re me, only working for another company (and probably with the full set of hair that I no longer have). You’re a decision maker, technical subject matter expert, or both. You have either been thinking about moving your company to the cloud or have been given the direction by management to start doing so.

You’ve also been told to make sure everything runs as smoothly, or better, than before. Maybe you’ve already started kicking the tires with cloud, you’ve got some production running and are ramping it up, or maybe you’re just starting to consider all the overwhelming intricacies that a cloud infrastructure involves. You’re wondering what the heck that’s going to mean for your operations, the various roles for your team, and what sort of issues you’re going to encounter that you haven’t even thought of yet.

That’s where I come in.

For the last seven-plus years we’ve been working on this inside Microsoft—we’ve gone from zero resources in Microsoft Azure to nearly 80 percent hosted in the cloud. I’ve been involved for most of that effort, including helping decide our direction technically and operationally and determining how our operational framework needed to change over time to support a hybrid environment. We went down some paths that we had to back out of (or are still trying to back out of in some cases). And we also had to go back to our internal customers and really listen to what they wanted, compromise, and iterate our services to enable them to get to the cloud more easily.

Apple spends much of his time walking customers through Microsoft’s internal journey to the cloud. (Photo by Jim Adams | Inside Track)

One of the foremost examples of a compromise we had to make was moving applications into the cloud in a modern way to meet pressing timelines. Ideally, if you’re looking for the most efficient way to move an application, you would consider refactoring it into a microservice or cloud native architecture. However, in our case, we had some strict deadlines to meet on existing datacenters and not necessarily enough budget to invest in a massive refactor of all the applications that had to go out.

Because of that, we employed the popular lift-and-shift migration strategy that took our applications from systems in the on-premises to the Microsoft Azure datacenter as IaaS virtual machines. As a colleague properly illustrated: it’s like taking boxes out of your garage and putting them into an offsite storage without having to unpack or rearrange them. It got the desired datacenters closed on time, and we can go back and revisit the applications over time and as budget allows.

Pete Apple’s expedition to the cloud began as a Microsoft vendor who worked on some very early versions of Microsoft software. Since then, the cloud services engineer has been on quite the journey helping the company move to the cloud. (Photo by Jim Adams | Inside Track)

As we trekked further into our expedition to the cloud, we found we needed to rethink the way we handled our management and monitoring as well as how the Microsoft Azure subscription service was delivered. We quickly embraced the agile methodologies other parts of our organization were moving to for software development and started using them for our service engineering and operations.

In the next few blogs, I’m going to dive a little deeper into each of these processes to give you an idea of where we were and where we got to with our Microsoft Azure service, with only a few bumps and bruises along the way. Hopefully we can save you some time, and headache, as you embark or move along your cloud computing journey. Meet me back here in a few weeks and let’s keep talking cloud.

Visit our content library for more Microsoft Azure scenarios and datacenter transformation content.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post The learnings, pitfalls, and compromises of Microsoft’s expedition to the cloud appeared first on Inside Track Blog.