Azure and cloud infrastructure Archives - Inside Track Blog http://approjects.co.za/?big=insidetrack/blog/tag/azure-and-cloud-infrastructure/ How Microsoft does IT Thu, 09 Apr 2026 14:57:27 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 137088546 Protecting anonymity at scale: How we built cloud-first hidden membership groups at Microsoft http://approjects.co.za/?big=insidetrack/blog/protecting-anonymity-at-scale-how-we-built-cloud-first-hidden-membership-groups-at-microsoft/ Thu, 26 Feb 2026 17:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=22465 Some Microsoft employee groups can’t afford to be visible. For years, we supported email‑based communities internally here at Microsoft whose very existence depends on anonymity. These include employee resource groups, confidential project teams, and other sensitive audiences where simply revealing who belongs can create real‑world risk. Traditional distribution groups make membership discoverable by default. Owners […]

The post Protecting anonymity at scale: How we built cloud-first hidden membership groups at Microsoft appeared first on Inside Track Blog.

]]>
Some Microsoft employee groups can’t afford to be visible.

For years, we supported email‑based communities internally here at Microsoft whose very existence depends on anonymity. These include employee resource groups, confidential project teams, and other sensitive audiences where simply revealing who belongs can create real‑world risk.

Traditional distribution groups make membership discoverable by default. Owners can see members. Admins can see members. In some cases, other users can infer membership through directory queries or tooling.

That model doesn’t work when anonymity is a requirement.

A photo of Reifers.

“When the SFI wave hit, it was made clear to us that we needed to keep our people safe, and to do that, we needed to build a new hidden memberships group MVP. We needed to raise the bar with modern groups, and we needed to do it in six months or miss meeting our goals.”

Brett Reifers, senior product manager, Microsoft Digital

For over 15 years, we relied on a custom, on‑premises solution that enabled employees to send and receive messages through groups with fully hidden memberships.

The system worked, but we were deprecating the Microsoft Exchange servers that it ran on. At the same time, we were also deploying our Secure Future Initiative (SFI), which required us to reassess legacy systems that could expose sensitive data or slow incident response, including hidden membership groups.

The system wasn’t broken, but it represented concentrated risk simply by existing outside our modern cloud controls and monitoring.

“When the SFI wave hit, it was made clear to us that we needed to keep our people safe, and to do that, we needed to build a new hidden memberships group MVP,” says Brett Reifers, a product manager in Microsoft Digital, the company’s IT organization. “We needed to raise the bar with modern groups, and we needed to do it in six months or miss meeting our goals.”

The mandate was clear. Preserve anonymity, eliminate on‑premises dependencies, and do it quickly.

A photo of Carson.

“Our solution would enable us to deprecate our legacy on-premises Exchange hardware while maintaining the privacy of our employee groups, and it would do so in a cloud-first manner.”

Nate Carson, principal service engineer, Microsoft Digital

Instead of retrofitting hidden membership into standard Microsoft 365 groups, we asked a different question: What if the group lived somewhere else entirely? What if users interacted with a simple, secure front end, while all membership expansion and mail flow occurred in a locked‑down tenant built specifically for this purpose?

That idea became the foundation for Hidden Membership Groups: A new cloud‑first architecture that would separate user experience, leverage first‑party Microsoft services, and keep our group memberships hidden from everyone—including owners and administrators—by design.

“Our solution would enable us to deprecate our legacy on-premises Exchange hardware while maintaining the privacy of our employee groups, and it would do so in a cloud-first manner,” says Nate Carson, a principal service engineer in Microsoft Digital.

Once we settled on a solution, our next step was to get support for solving a problem not many people thought much about.

“Not everyone was aware of how serious of a situation we were in,” Carson says. “We had to show everyone what was at stake, and to share our solution with them.”

After taking their plan on the road, the team got the buy in it needed, and that’s when the real work started.  

Planning to solve business problems with security built-in

Before we designed anything, we had to be clear about what success meant.

Hidden Membership Groups aren’t just another collaboration feature. They support scenarios where anonymity wasn’t optional—it’s foundational. That reality shaped every requirement that we built into our solution, including:

1. Absolute privacy

Group membership couldn’t be immediately visible to users, group owners, or administrators–under any circumstances. That requirement immediately ruled out standard group models.

2. Cloud only

Any new solution had to live entirely in our cloud, use first‑party services, and align with modern identity, security, and compliance practices. On‑premises infrastructure wasn’t an option.

3. Scale

Some groups had a handful of members. Others had tens of thousands. Membership changed frequently, and those changes had to propagate safely and predictably without exposing data or degrading performance.

4. Separation of concerns

User interaction and membership truth couldn’t live in the same place. Employees needed a simple way to discover groups, request access, and manage participation, without ever interacting with the system that stored or expanded membership.

5. Self‑service with guardrails

The solution needed to reduce operational overhead, not introduce a new bottleneck. Group lifecycle management had to be automated, auditable, and secure, while still giving teams flexibility.

6. Simple to use

Employees shouldn’t need special training. They shouldn’t need to understand tenants, identity synchronization, or mail routing. The experience needed to be intuitive, consistent, and accessible—without compromising security.

Once those requirements were clear, our solution started to emerge. Incremental changes wouldn’t be enough. A traditional group model wouldn’t work. The solution required a new architecture—one designed around isolation, automation, and intentional limitation.

That’s when we started the engineering work.

Creating a cloud-first architecture

Designing for hidden membership meant eliminating ambiguity. If any surface could reveal membership, even indirectly, it didn’t belong in the design.

That constraint led us toward a model built on strict isolation, explicit APIs, and intentionally narrow interfaces. The result is straightforward to use, but deliberately difficult to interrogate.

Two tenants, with sharply separated responsibilities

At the foundation of the solution is a two‑tenant model.

Our primary Microsoft 365 tenant is where employees authenticate, discover groups, and initiate actions. A secondary, isolated tenant hosts the distribution lists and performs mail expansion for Hidden Membership Groups.

A photo of Mace.

“Tenant isolation is what makes the privacy guarantee real. By moving membership expansion to a tenant that users and owners can’t access, we removed the possibility of accidental exposure. The system simply doesn’t give you a place where membership can be seen.”

Chad Mace, principal architect, Microsoft Digital

That separation matters because the secondary tenant isn’t designed for interactive use. Only Exchange and the minimum directory constructs required for mail routing and expansion are enabled.

Operationally, when an employee sends email to a Hidden Membership Group, they send to a mail contact visible in the corporate tenant. That contact routes to the corresponding distribution group in the isolated tenant, where membership expansion occurs. Expanded messages are then delivered back in recipients’ inboxes in the corporate tenant, so sent and received mail lives where users already work.

“Tenant isolation is what makes the privacy guarantee real,” says Chad Mace, a principal architect in Microsoft Digital. “By moving membership expansion to a tenant that users and owners can’t access, we removed the possibility of accidental exposure. The system simply doesn’t give you a place where membership can be seen.”

Identity without interactive access

This isolated tenant only works if it can resolve recipients. To enable that, our development team used Microsoft Entra ID multi‑tenant organization identity sync to represent corporate users in the secondary tenant.

These identities are treated as business guest identities, and we disable sign‑in to prevent interactive access. The tenant can perform expansion, but nothing more.

However, complete isolation wasn’t technically possible. Privileged access always exists at some level. The design response was to minimize that exposure. Access to the isolated tenant is tightly restricted, and membership changes flow through automation rather than broad UI-based administration.

The goal: reduce exposure to the smallest viable operational group.

API-first automation as the control plane

With tenancy and identity model established, the team needed a single, consistent way to create groups, connect objects across tenants, and manage changes without introducing new administrative workflows. That’s where the APIs come in.

A photo of Pena II.

“We split the backend into multiple APIs so the system could scale without becoming fragile. That let us separate everyday operations from high-volume membership work and keep performance predictable.”

John Pena II, principal software engineer, Microsoft Digital

The backend is intentionally modular, split into three distinct APIs:

  • The control API handles group creation, configuration, and cross‑tenant coordination.
  • The membership API handles standard add and remove operations.
  • The bulk membership APIs handle large‑scale operations involving tens of thousands of users, with services designed to run long‑lived jobs, manage throttling, and recover from partial failures.

“We split the backend into multiple APIs so the system could scale without becoming fragile,” says John Pena II, a principal software engineer in Microsoft Digital. “That let us separate everyday operations from high-volume membership work and keep performance predictable.”

The APIs run as PowerShell-based Azure Functions and use managed identity patterns, including federated identity credentials, to securely connect across tenants.

Creating the user experience with PowerApps

For the front end, we built a Canvas app in Power Apps, backed by Dataverse. The goal was speed and flexibility, without compromising strict privacy boundaries.

By using Power Apps as the primary interaction layer, we deliver a secure, modern experience without unnecessary custom infrastructure. The Canvas app provides a single, focused surface for discovering, joining, and managing hidden membership groups, while all sensitive operations remain behind controlled APIs and tenant boundaries. This separation allows the team to iterate quickly on experience design without weakening the privacy guarantees that the solution depends on.

Power Platform also simplifies how security is being enforced across the solution. Dataverse enables fine‑grained, role‑based access, ensuring users only see data they’re entitled to see—while keeping sensitive membership information entirely out of the client layer. That reduces long‑term maintenance overhead and makes it easier to evolve the solution as requirements change.

“From the beginning, we designed everything with security roles and workflows in mind,” says Shiva Krishna Gollapelly, senior software engineer in Microsoft Digital. “Dataverse let us control who could see or change data without building additional APIs or storage layers, and keeping everything inside the Power Apps ecosystem saved us a lot of maintenance over time.”

Dataverse plays a precise role here: it maintains the datastore the app needs to function without becoming a secondary membership repository.

A photo of Amanishahrak.

“Using the Power Platform let us move fast, integrate deeply with Microsoft identity, and enforce security without building a full web stack from scratch.”

Bita Amanishahrak, software engineer II, Microsoft Digital

From a security posture perspective, Dataverse security is used intentionally to restrict what different users can see and do, and the Power App was developed with security roles and workflows in mind.

Short version: the app brokers intent, the APIs execute it, and all the pieces that need to stay separate do exactly that.

“Using the Power Platform let us move fast, integrate deeply with Microsoft identity, and enforce security without building a full web stack from scratch,” says Bita Amanishahrak, a software engineer in Microsoft Digital.

The architectural intent is consistent throughout—isolate the sensitive plane and ensure the user plane operates only through controlled interfaces.

Benefits and impact

The most important outcome of the new architecture is also the simplest: Hidden membership stays hidden.

Anonymity isn’t enforced by policy. It’s enforced by architecture. Membership data never appears in the user experience or administrative tooling, and it doesn’t surface as a side effect of scale.

“We’re no longer asking people to trust that we’ll handle sensitive membership carefully through process,” Reifers says. “The system makes exposure structurally impossible.”

The impact was immediate.

At launch, we migrated more than 2,200 hidden membership groups, representing over 200,000 users, from the legacy on‑premises system into the new cloud‑first architecture. Groups ranged from small, tightly controlled communities to audiences with tens of thousands of members, all supported without special handling.

“Some of these groups are massive,” Pena says. “We knew from the beginning we were dealing with memberships in the tens of thousands, which is why we designed bulk operations as a first‑class capability instead of an afterthought.”

The separation between routine APIs and bulk‑membership APIs proved critical, enabling large migrations and ongoing changes without degrading day-to-day performance.

Operationally, moving to a cloud‑only model reduced both risk and complexity. Decommissioning the on‑premises Exchange infrastructure eliminated specialized maintenance requirements and improved monitoring, auditing, and access controls alignment with our modern cloud standards.

Delivery speed also mattered. Driven by Secure Future Initiative urgency and strong executive sponsorship, the team designed and delivered a minimum viable product in less than six months.

“That timeline forced discipline,” Reifers says. “We focused on what mattered: Security, privacy guarantees, scale, and a UX that wouldn’t disrupt group owners and/or members that had relied on a 15-year old tool.”

Everything else was secondary.

A photo of Gollapelly.

“Most users never think about tenants or APIs. They just see a clean experience that does what they need, without exposing anything it shouldn’t.”

Shiva Krishna Gollapelly, senior software engineer, Microsoft Digital

From an employee perspective, the experience became simpler and safer. Users now interact through a Power Platform app consistent with the rest of Microsoft 365.

Discovering a group, requesting access, or leaving a group no longer requires understanding the architecture behind it.

“Most users never think about tenants or APIs,” Gollapelly says. “They just see a clean experience that does what they need, without exposing anything it shouldn’t.”

The result is sustainable. The platform protects anonymity at scale, simplifies operations, boosts resiliency, and can evolve without reopening core privacy questions.

Moving forward

Delivering the initial solution was only the beginning.

The team sees Hidden Membership Groups as more than a single solution. It’s a reusable pattern for sensitive collaboration in a cloud‑first world: isolate what matters most, automate everything else, and design experiences that don’t require trust to be safe.

As adoption grows, the team plans to support additional anonymity-sensitive scenarios while maintaining the same underlying model.

“We don’t want every sensitive scenario inventing its own workaround,” Mace says. “This gives us a pattern we can reuse confidently.”

Future priorities include improving lifecycle and ownership experiences, strengthening auditing and reporting for approved administrators, and enhancing self‑service workflows—without compromising membership privacy. If it risks exposing membership, it doesn’t ship.

With the legacy system fully retired, Reifers reflects on what the team accomplished to get here.

“We shipped a new enterprise pattern in six months using our first party tools,” Reifers says. “We achieved this because a stellar team cared about the mission. That’s the takeaway.”

Key takeaways

Use these tips to strengthen your privacy, simplify your operations, and future-proof your organization’s collaboration systems:

  • Prioritize privacy by design. Embed privacy considerations from the start to protect sensitive information in all collaboration scenarios.
  • Architect for scale. Treat bulk operations to support large groups efficiently as a first-class capability.
  • Automate and modernize workflows. Replace legacy systems with cloud-native solutions to reduce risk, improve transparency, and enable continuous improvement.
  • Streamline user experience. Provide intuitive, consistent interfaces that make it easy for users to access, join, or leave groups without requiring technical knowledge.
  • Enforce strict access and auditing controls. Align monitoring and administration with modern cloud standards to maintain security and accountability.
  • Create reusable patterns. Establish and share successful privacy patterns to avoid reinventing solutions for each new case.
  • Focus on operational simplicity and resilience. Design systems that are easy to maintain and improve, freeing up teams to concentrate on innovation rather than upkeep.

The post Protecting anonymity at scale: How we built cloud-first hidden membership groups at Microsoft appeared first on Inside Track Blog.

]]>
22465
Moving from a ‘Scream Test’ to holistic lifecycle management: How we manage our Azure services at Microsoft http://approjects.co.za/?big=insidetrack/blog/moving-from-a-scream-test-to-holistic-lifecycle-management-how-we-manage-our-azure-services-at-microsoft/ Thu, 20 Nov 2025 17:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=21193 Nearly a decade ago, as we began our journey from relying on on-premises physical computing infrastructure to being a cloud-first organization, our engineers came up with a simple but effective technique to see if a relatively inactive server was really needed. Engage with our experts! Customers or Microsoft account team representatives from Fortune 500 companies […]

The post Moving from a ‘Scream Test’ to holistic lifecycle management: How we manage our Azure services at Microsoft appeared first on Inside Track Blog.

]]>
Nearly a decade ago, as we began our journey from relying on on-premises physical computing infrastructure to being a cloud-first organization, our engineers came up with a simple but effective technique to see if a relatively inactive server was really needed.

They dubbed it the “Scream Test.”

“We didn’t have a great server inventory and tracking system, and we didn’t always know who owned a server,” says Brent Burtness, a principal software engineer in Commerce Financial Platforms, who was one of the leaders for the effort in his group. “So, we essentially just turned them off. If someone screamed—‘Hey, why’d you turn off my server?’—then we’d know it was still being used.”

Today, the basic idea behind the Scream Test is being used across the company, but in a more holistic way. Importantly, it’s been incorporated into the overall lifecycle management of our computing infrastructure. And, through the automation tools provided by Microsoft Azure, we have a much more efficient process for making sure that we’re saving time and money by reducing the number of underused machines we operate, monitor, and maintain.

A photo of Apple

“We thought we were going to get rid of a small number of machines that weren’t being used. But we found the actual share was about 15% of all machines, which saved us a lot of effort of moving those unused machines to the cloud. In other words, we downsized on the way to the cloud, rather than after the fact.”

Pete Apple, cloud network engineering architect, Microsoft Digital

Uncovering more than expected

The Scream Test was part of the huge effort to evaluate our on-premises compute resources before we began moving to the Azure cloud. After all, why spend resources moving something that isn’t needed?

Pete Apple, who helped develop the concept of the Scream Test, is a cloud network engineering architect in Microsoft Digital, the company’s IT organization. Looking back, he remembers the surprising results that emerged when they began shutting down specific servers to see who noticed.

“We thought we were going to get rid of a small number of machines that weren’t being used,” Apple says. “But we found the actual share was about 15% of all machines, which saved us a lot of effort of moving those unused machines to the cloud. In other words, we downsized on the way to the cloud, rather than after the fact.”

As part of this process, Apple explains, our engineers looked at two related factors to reduce inefficiencies in our usage of computing resources.

The first was to identify systems that were used infrequently, at a very low level of CPU (sometimes called “cold” servers). From that, we could determine which systems in our on-premises environments were oversized—meaning someone had purchased physical machines according to what they thought the load would be, but either that estimate was incorrect or the load diminished over time. We took this data and created a set of recommended Microsoft Azure Virtual Machine (VM) sizes for every on-premises system to be migrated.

“We learned that there’s a lot of orphaned, or underutilized, resources out there,” Burtness says. “These were cases where the workload was so small on a server—like under 5% CPU—that it didn’t make sense to host it on its own machine. We could then move the task or application and get it down to just one or two CPUs on a virtual machine.”

At the time, we did much of this work manually, because we were early adopters. The company now has a number of products available to assist with this review of your on-premises environment, led by Azure Migrate.

Another part of the process was determining which systems were being used for only a few days a month or at certain busy times of the year. These development machines, test/QA machines, and user acceptance testing machines (reserved for final verification before moving code to production) were running continuously in the datacenter but were really only needed during limited windows. For these situations, we applied the tools available in Azure Resource Manager Templates and Azure Automation to ensure the machines would only run when needed.

Automating with Azure

Today, we don’t have to rely on anything as crude as the Scream Test to find unused and underused computing resources. With 98% of our IT resources operating in the Azure cloud, we have much greater insight into how efficient our network is, so much of the process can be automated.

“We’ve found this effort much easier to manage in the cloud, because all our computing resources are integrated with the Azure portal,” Apple says. “They have an API system and offer various tools within Azure Update Manager and Azure Advisor to help with cost efficiency. It’s kind of like a modern version of Clippy—’Hey, it looks like your VM isn’t being used much. Do you want to downsize that or turn it off?'”

(For the uninitiated, Clippy was the Microsoft Office animated paperclip assistant introduced in the late 1990s. It offered tips and help with tasks, like writing and formatting documents. Clippy became iconic for its quirky suggestions, including recommending that you remove things from your desktop that you weren’t using.)

Burtness smiles in a portrait photo.

“With everything being in the Azure portal or in Azure Resource Graph, it’s much more streamlined, and makes it easier to get that data out to the teams. They can then go into the portal and clean up the resource.”

Brent Burtness, principal software engineer, Commerce Financial Platforms

And simply taking the step of turning off stuff that we weren’t using turned out to be very effective. Thanks, Clippy!

Today, we approach this challenge in a more efficient and sophisticated way, taking advantage of Azure tools like Update Manager and Advisor.

“With everything being in the Azure portal or in Azure Resource Graph, it’s much more streamlined, and makes it easier to get that data out to the teams,” Burtness says. “We can run automated queries with Azure Resource Graph. Then we bring that information into our internal Service 360 tool, which we use to give action items to our developers. Each item gives them a link to Azure portal, and they can then go into the portal and clean up the resource.”

Managing for the lifecycle

One of the most important things we learned by using the Scream Test to identify inefficiencies and moving our systems from on-premises servers to the cloud was that it’s an ongoing process, not a fixed-end project.

“We had this idea that it was going to be a one-time event, that we’ll move to the cloud and then we’ll be done,” Apple says. “A better understanding is that it’s a lifecycle. We have integrated this concept of continual evaluation into our processes around everything that’s still on-premises, because we still have labs, we still have physical infrastructure.”

We continue to do this evaluation on a regular basis with both physical and virtual computing resources, because needs and usage are constantly changing.

Cutting our cloud costs

A text graphic shows the savings that one group at Microsoft achieved by becoming more efficient in their compute usage.
In a pilot set of Azure subscriptions, the Commerce Financial Platforms team reduced usage by 233 resources across 36 subscriptions and 17 services in 6 team groups, saving more than $15,000 in monthly operating costs.

“Now we have a basic process around a six-month cycle,” Apple says. “So, every six months we ask, does this still need to be on-premises or should we start moving it to the cloud? And we do the same thing with our cloud resources. Who’s still using these VMs? And we still go through the same review process to see if it’s needed, or if we can shut it down or move it.”

This has resulted in significant cost savings for the company. “We’re up to about 15% to 20% less compute cost, depending on the organization, because of this much better understanding of our business needs,” Apple says.

Better governance, increased security

Another major benefit of this process was establishing much stronger governance of compute resources across the entire organization.

“When we first did the Scream Test, we weren’t always really sure who owned what, in some cases,” Apple says. “We’ve fixed that as part of this process. This governance aspect is a key part of being more efficient with our resources.”

Burtness explains why this is so important.

“It’s critical to know exactly who to contact when there’s something wrong with the server,” Burtness says. “Now, with clearer ownership, clearer accountability, and better inventory, it’s a much better experience.”

Better governance also means tighter security, according to both Apple and Burtness.

“This is really important when it comes to threat-actor response,” Apple says. “Unused servers can often be an entry point for hackers. Or, say we discover that a machine or server is getting hacked; you need to talk to who owns it. If you don’t know, it takes you longer to track them down and combat the hack. That’s not great. Improving our governance has definitely made securing our environment easier.”

Key takeaways

Here are some things to keep in mind when managing your own enterprise compute resources for greater efficiency:

  • It’s not a one-time exercise. For the best results, you should be evaluating your computing resources on a regular schedule to identify ”cold” servers and unused infrastructure.
  • Adjust for variable usage patterns. It’s not just about unused servers. Some machines may only be needed for a business function during certain busy times of the year. Consider turning the machines on just to handle the load during those periods and turning them off the rest of the year.
  • Use Azure tools for greater insight. If you’re operating your infrastructure in the Azure cloud, you can much more easily monitor and address orphaned resources using automated tools such as Azure Advisor, Azure Resource Graph, and the Azure portal.
  • Apply your savings to other priorities. “The more efficient you are, the more savings can be applied to other projects or given back to your manager—who is going to be very happy with you,” Apple says.
  • Saving money is not the only benefit. You’ll not only save operating costs, you’ll have a reduced maintenance and monitoring load, better governance, and fewer security vulnerabilities.

The post Moving from a ‘Scream Test’ to holistic lifecycle management: How we manage our Azure services at Microsoft appeared first on Inside Track Blog.

]]>
21193
Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure http://approjects.co.za/?big=insidetrack/blog/modernizing-it-infrastructure-at-microsoft-a-cloud-native-journey-with-azure/ Thu, 04 Sep 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20125 Engage with our experts! Customers or Microsoft account team representatives from Fortune 500 companies are welcome to request a virtual engagement on this topic with experts from our Microsoft Digital team. At Microsoft, we are proudly a cloud-first organization: Today, 98% of our IT infrastructure—which serves more than 200,000 employees and incorporates over 750,000 managed […]

The post Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure appeared first on Inside Track Blog.

]]>

Engage with our experts!

Customers or Microsoft account team representatives from Fortune 500 companies are welcome to request a virtual engagement on this topic with experts from our Microsoft Digital team.

At Microsoft, we are proudly a cloud-first organization: Today, 98% of our IT infrastructure—which serves more than 200,000 employees and incorporates over 750,000 managed devices—runs on the Microsoft Azure cloud.

The company’s massive transition from traditional datacenters to a cloud-native infrastructure on Azure has fundamentally reshaped our IT operations. By adopting a cloud-first, DevOps-driven model, we’ve realized significant gains in agility, scalability, reliability, operational efficiency, and cost savings.

“We’ve created a customer-focused, self-serve management environment centered around Azure DevOps and modern engineering principles,” says Pete Apple, a technical program manager and cloud architect in Microsoft Digital, the company’s IT organization. “It has really transformed how we do IT at Microsoft.”

“Our service teams don’t have to worry about the operating system. They just go to a website, fill in their info, add their data, and away they go. That’s a big advantage in terms of flexibility.”

Apple is shown in a portrait photo.
Pete Apple, technical program manager and cloud architect, Microsoft Digital

What it means to move from the datacenter to the cloud

Historically, our IT environment was anchored in centralized, on-premises datacenters. The initial phase of our cloud transition involved a lift-and-shift approach, migrating workloads to Azure’s infrastructure as a service (IaaS) offerings. Over time, the company evolved toward more of a decentralized, platform as a service (PaaS) DevOps model.

“In the last six or seven years we’ve seen a lot more focus on PaaS and serverless offerings,” says Faisal Nasir, a principal architect in Microsoft Digital. “The evolution is also marked by extensibility—the ability to create enterprise-grade applications in the cloud—and how we can design well-architected end-to-end services.”

Because we’ve moved nearly all our systems to the cloud, we have a very high level of visibility into our network operations, according to Nasir. We can now leverage Azure’s native observability platforms, extending them to enable end-to-end monitoring, debugging, and data collection on service usage and performance. This capability supports high-quality operations and continuous improvement of cloud services.

“Observability means having complete oversight in terms of monitoring, assessments, compliance, and actionability,” Nasir says. “It’s about being able to see across all aspects of our systems and our environments, and even from a customer lens.”

Decentralizing our IT services with Azure

As Microsoft was becoming a cloud-first organization, the nature of the cloud and how we use it changed. As Microsoft Azure matured and more of our infrastructure and services moved to the cloud, we began to move away from IT-owned applications and services.

The strengths of the Azure self-service and management features means that individual business groups can handle many of the duties that Microsoft Digital formerly offered as an IT service provider—which enables each group to build agile solutions to match their specific needs.

“Our goal with our modern cloud infrastructure continues to be a solution that transforms IT tasks into self-service, native cloud solutions for monitoring, management, backup, and security across our entire environment,” Apple says. “This way, our business groups and service lines have reliable, standardized management tools, and we can still maintain control over and visibility into security and compliance for our entire organization.”

The benefits to our businesses of this decentralized model of IT services include:

  • Empowered, flexible DevOps teams
  • A native cloud experience: subscription owners can use features as soon as they’re available
  • Freedom to choose from marketplace solutions
  • Minimal subscription limit issues
  • Greater control over groups and permissions
  • Better insights into Microsoft Azure provisioning and subscriptions
  • Business group ownership of billing and capacity management

“With the PaaS model, and SaaS (software as a service), it’s more DIY,” Apple says. “Our service teams don’t have to worry about the operating system. They just go to a website, fill in their info, add their data, and away they go. That’s a big advantage in terms of flexibility.”

“The idea of centralized monitoring is gone. The new approach is that service teams monitor their own applications, and they know best how to do that.”

Delamarter is shown in a portrait photo.
Cory Delamarter, principal software engineering manager, Microsoft Digital

Leveraging the power of Azure Monitor

Microsoft Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to monitoring data from cloud and on-premises environments. Across Microsoft, we use Azure Monitor to ensure the highest level of reliability for our services and applications.

Specifically, we rely on Azure Monitor to:

Create visibility. There’s instant access to fundamental metrics, alerts, and notifications across core Azure services for all business units. Azure Monitor also covers production and non-production environments as well as native monitoring support across Microsoft Azure DevOps.

Provide insight. Business groups and service lines can view rich analytics and diagnostics across applications and their compute, storage, and network resources, including anomaly detection and proactive alerting.

Enable optimization. Monitoring results help our business groups and service lines understand how users are engaging with their applications, identify sticking points, develop cohorts, and optimize the business impact of their solutions.

Deliver extensibility. Azure Monitor is designed for extensibility to enable support for custom event ingestion and broader analytics scenarios.

Because we’ve moved to a decentralized IT model, much of the monitoring work has moved to the service team level as well.

“The idea of centralized monitoring is gone,” says Cory Delamarter, a principal software engineering manager in Microsoft Digital. “The new approach is that service teams monitor their own applications, and they know best how to do that.”

Patching and updating, simplified

Moving our operations to the cloud also means a simpler and more automated approach to patching and updating. The shift to PaaS and serverless networking has allowed us to manage infrastructure patching centrally, which is much more scalable and efficient. The extensibility of our cloud platforms reduces integration complexity and accelerates deployment.

“It depends on the model you’re using,” Nasir says. “With the PaaS and serverless networks, the service teams don’t need to worry about patching. With hybrid infrastructure systems, being in the cloud helps with automation of patching and updating. There’s a lot of reusable automation layers that help us build end-to-end patching processes in a faster and more reliable manner.”

Apple stresses the flexibility that this offers across a large organization when it comes to allowing teams to choose how they do their patching and updating.

“In the datacenter days, we ran our own centralized patching service, and we picked the patching windows for the entire company,” Apple says. “By moving to more automated self-service, we provide the tools and the teams can pick their own patching windows. That also allowed us to have better conversations, asking the teams if they want to keep doing the patching or if they want to move up the stack and hand it off to us. So, we continue to empower the service teams to do more and give them that flexibility.”

Securing our infrastructure in a cloud-first environment

As security has become an absolute priority for Microsoft, it’s also been a foundational element of our cloud strategy.

Being a cloud-first company has made it easier to be a security-first organization as well.

“The cloud enables us to embed security by design into everything we build,” Nasir says. “At enterprise scale, adopting Zero Trust and strong governance becomes seamless, with controls engineered in from the start, not retrofitted later. That same foundation also prepares us for an AI-first future, where resilience, compliance, and automation are built into every system.”

Cloud-native security features combined with integrated observability allow for better compliance and risk management. Delamarter agrees that the cloud has had huge benefits when it comes to enhancing network security.

“Our code lives in repositories now, and so there’s a tremendous amount of security governance that we’ve shifted upstream, which is huge,” Delamarter says. “There are studies that show that the earlier you can find defects and address them, the less expensive they are to deal with. We’re able to catch security issues much earlier than before.”

“There are less and less manual actions required, and we’re automating a lot of business processes. It basically gives us a huge scale of automation on top of the cloud.”

Nasir is shown in a portrait photo.
Faisal Nasir, principal architect, Microsoft Digital

We use Azure Policy, which helps enforce organizational standards and assess compliance at scale using dashboards and other monitoring tools.

“Azure Policy was a key part of our security approach, because it essentially offers guardrails—a set of rules that says, ‘Here’s the defaults you must use,’” Apple says. “You have to use a strong password, for example, and it has to be tied to an Azure Active Directory ID. We can dictate really strong standards for everything and mandate that all our service teams follow these rules.”

AI-driven operations in the cloud

Just like its impact on the rest of the technology world, AI is in the process of transforming infrastructure management at Microsoft. Tasks that used to be manual and laborious are being automated in many areas of the company, including network operations.

“AI is creating a new interface of agents that allow users to interact with large ecosystems of applications, and there’s much easier and more scalable integration,” says Nasir. “There are less and less manual actions required, and we’re automating a lot of business processes. Microsoft 365 Copilot, Security Copilot, and other AI tools are giving us shared compute and extensibility to produce different agents. It basically gives us a huge scale of automation on top of the cloud.”

Apple notes that powerful AI tools can be combined with the incredible amount of data that the Microsoft IT infrastructure generates to gain insights that simply weren’t possible before.

“We can integrate AI with our infrastructure data lakes and use tools like Network Copilot to query the data using natural language,” Apple says. “I can ask questions like, ‘How many of our virtual machines need to be patched?’ and get an answer. It’s early, and we’re still experimenting, but the potential to interact with this data in a more automated fashion is exciting.”

Ultimately, Microsoft has become a cloud-first company, and that has allowed us to work toward an AI-first mentality in everything we do.

“Having a complete observability strategy across our infrastructure modernization helps us to make sure that whatever changes we’re making, we have a design-first approach and a cloud-first mindset,” Nasir says. “And now that focus is shifting towards an AI-first mindset as well.”

Key takeaways

Here are some of the benefits we’ve accrued by becoming a cloud-first IT organization at Microsoft:

  • Transformed operations: By moving from our legacy on-premises datacenters, through Azure’s infrastructure as a service (IaaS) offerings, and eventually to a platform as a service (PaaS) DevOps model, we’ve reaped great gains in reliability, efficiency, scalability, and cost savings.
  • A clear view: With 98% of our organization’s IT infrastructure running in the Azure cloud, we have a huge level of observability into our systems—complete oversight into network assessment, monitoring, compliance, patching/updating, and many other aspects of operations.
  • Empowered teams: Operating a cloud-first environment allows us to have a more decentralized approach to IT infrastructure. This means we can offer our business groups and service lines more self-service, cloud-native solutions for monitoring, management, patching, and backup while still maintaining control over and visibility into security and compliance for our entire organization.
  • Seamless updates: The shift to PaaS and serverless networking has enabled a more planned and automated approach to patching and updating our infrastructure, which produces greater efficiency, integration, and speed of deployment.
  • Dependable security: Our cloud environment has allowed us to implement security by design, including tighter control over code repositories and the use of standard security policies across the organization with Azure Policy.
  • Future-proof infrastructure: As we shift to an AI-first mindset across Microsoft, we’re using AI-driven tools to enhance and maintain our native cloud infrastructure and adopt new workflows that will continue to reap dividends for our employees and our organization.  

The post Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure appeared first on Inside Track Blog.

]]>
20125
The $500-billion challenge: Inside the modernization of Microsoft Treasury’s backend infrastructure http://approjects.co.za/?big=insidetrack/blog/the-500-billion-challenge-inside-the-modernization-of-microsoft-treasurys-backend-infrastructure/ Thu, 19 Jun 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=19379 Editor’s note: This story was created with the help of artificial intelligence. To learn more about how Inside Track is using the power of generative AI to augment our human staff, see our story, Reimagining content creation with our Azure AI-powered Inside Track story bot. Engage with our experts! Customers or Microsoft account team representatives […]

The post The $500-billion challenge: Inside the modernization of Microsoft Treasury’s backend infrastructure appeared first on Inside Track Blog.

]]>
Editor’s note: This story was created with the help of artificial intelligence. To learn more about how Inside Track is using the power of generative AI to augment our human staff, see our story, Reimagining content creation with our Azure AI-powered Inside Track story bot.

When you’re responsible for processing over $500 billion in transactions every year, modernization becomes a mission-critical and highly delicate undertaking, where even the smallest misstep can carry serious financial consequences.

This was the challenge facing the Microsoft Treasury group, whose global operations relied on a patchwork of aging on-premises infrastructure, legacy systems, and leased lines.

Even the most mission-critical and entrenched systems must eventually evolve to meet the modern demands of speed, security, and scale.

A photo of Manikala.

“Modernizing the Treasury Service was not just about adopting new technology. It was about ensuring the uninterrupted operation of vital financial services while collaborating with various teams and meeting all security checks.”

Srinubabu Manikala, principal network engineering manager, Microsoft Digital

For Microsoft Treasury, that time had come—but instead of a straightforward infrastructure upgrade, the company seized the opportunity to go further.

What followed was a bold, strategic transformation that reinvented Microsoft Treasury’s core financial services, phasing out its aging infrastructure and migrating one of the world’s most complex treasury operations to the cloud.

“Modernizing the Treasury Service was not just about adopting new technology,” says Srinubabu Manikala, a principal network engineering manager in Microsoft Digital, the company’s IT organization. “It was about ensuring the uninterrupted operation of vital financial services while collaborating with various teams and meeting all security checks.”

A complex web of legacy, on-premises dependencies

Microsoft Treasury’s legacy infrastructure was initially built around a model where physical presence and dedicated hardware were the norm.

A photo of Shah.

“The legacy network architecture was heavily dependent on on-premises infrastructure and leased lines from third-party partners. This introduced constraints, making the environment complex and difficult to scale.”

Harsh Shah, senior service engineer, Microsoft Digital

It supported a vast network of over 80 banking partners across more than 110 countries, enabling essential financial functions like bank guarantees, supply chain financing, ledger updates, and global cash visibility.

This infrastructure complexity made modernization a highly challenging, costly, and risky endeavor—especially with an architecture that relied heavily on leased lines, aging hardware, and on-premises access methods.

“The legacy network architecture was heavily dependent on on-premises infrastructure and leased lines from third-party partners,” says Harsh Shah, a senior service engineer for Microsoft Digital. “This introduced constraints, making the environment complex and difficult to scale.”

For instance, the “Trading Room” required traders to be on-site to access treasury systems, a model that was quickly disrupted during the COVID-19 pandemic. The growing need for secure remote access only intensified these pressures, especially as seamless integration with cloud-first partners became critical, and downtime was a non-starter. Even brief outages risked financial penalties and could disrupt transactions worth billions.

Navigating the challenge

The Microsoft Digital team responsible for overseeing Microsoft Treasury’s network infrastructure proposed two potential architectural solutions that would meet their modernization requirements while enhancing network infrastructure.

A photo of Griffin.

“Our partners in Treasury ultimately chose the second option, transitioning to a hybrid network. They have a long-term goal of moving entirely to the cloud using Azure.”

Justin Griffin, principal group network engineering manager, Microsoft Digital

The first solution involved refreshing all on-premises infrastructure and implementing robust measures to ensure continuity of services during the transition—a costly but safe bet.

The second, more ambitious solution called for a phased transition to a hybrid network with a long-term goal: go fully cloud-native using Microsoft Azure.

“Our partners in Treasury ultimately chose the second option, transitioning to a hybrid network,” says Justin Griffin, a principal group network engineering manager in Microsoft Digital, who led the team responsible for getting the project off the ground. “They have a long-term goal of moving entirely to the cloud using Azure.”

The decision was influenced by several factors, including the need to eliminate costly hardware and the desire for streamlined network management processes, including the use of Azure for seamless integration with internal and external systems.

Implementing the solution

With the second option chosen, the implementation goals were to eliminate on-premises hardware, cut costs, simplify management, and empower team members and partners to access Microsoft Treasury’s network from anywhere, securely. To this end, Azure would become the new backbone for Microsoft Treasury’s infrastructure.

The modernization effort centered around two cornerstone projects—the SWIFT Alliance SaaS migration and the migration of BlackRock’s Aladdin platform into Azure. The projects would leverage services like Azure VPN for secure remote access, Azure Firewall for enhanced protection, and Azure Virtual WAN (vWAN) for seamless global connectivity.

Modernizing the SWIFT integration

Microsoft Treasury relies on SWIFT for secure international payments. Previously, access to SWIFT required the use of on-premises hardware security modules (HSMs) for attestation and encryption.

The modernization efforts followed a phased migration path:

  • Transitioning connectivity to Azure using vWAN and Site-to-Site VPNs
  • Maintaining security by peering cloud networks with on-prem HSMs
  • Eventually replacing on-premises HSMs with SWIFT’s SaaS-based attestation solution

The result was the retirement of leased lines and aging hardware, a reduced data center footprint, and cost savings of hundreds of thousands of dollars.

Aladdin secure remote access

To enable secure remote access to Aladdin—BlackRock’s investment management platform—Microsoft Digital collaborated with BlackRock and internal finance teams to implement a cloud-native Azure solution based on the following:

  • Azure vWAN hubs with Point-to-Site VPNs for private user access
  • Palo Alto Network virtual appliances for deep traffic monitoring
  • BGP peering over IPsec for encrypted data transfers
  • Geo-redundant routing for automatic failover in case of outages

Before the migration, outages caused by link failures, power surges, and WAN disruptions were not uncommon. But with the new infrastructure in place, Treasury Services users gained secure, uninterrupted access to Aladdin from anywhere. The move to the cloud, reinforced by availability zones and built-in high availability, effectively put an end to those disruptions.

A team effort: Reducing project risks and bolstering communications

During the transition, the Microsoft Digital team, the Treasury Services team, and their financial partners all played critical roles in executing a highly coordinated and technically demanding transformation.

To maintain continuity, the Treasury Services team temporarily increased its budget to support parallel operations across both the legacy on-premises environment and the new Azure-based infrastructure.

A photo of Ramirez.

“We needed to make sure the communications were clear and acknowledged by each responsible individual to make sure no errors were made that compromised the availability of the system.”

Lionel Ramirez, senior technical program manager, Microsoft Digital Services

They also deployed new VPN clients to enable secure remote access and eventually migrated their HSMs handling critical SWIFT services to the SWIFT-hosted SaaS platform.

For financial partners, the migration meant shifting from traditional on-premises circuits to modern, cloud-based integrations with Azure. This required close collaboration across multiple internal and external teams. To support this shift, Microsoft Digital built a new Azure network infrastructure that integrated with legacy systems while laying the foundation for the fully cloud-hosted Treasury Services infrastructure.

“We needed to make sure the communications were clear and acknowledged by each responsible individual to make sure no errors were made that compromised the availability of the system,” says Lionel Ramirez, a senior technical program manager for Microsoft Digital Services.

Throughout the migration, the Microsoft Digital team ensured clear, continuous communication and required explicit acknowledgements for every critical step to minimize the risk of error and maintain service availability. All changes were carefully timed to occur after market hours and before trading activity resumed, further reducing the risk of disruption or financial penalties. The project team also adhered to stringent security and compliance requirements at every phase of the transition.

The results: Transformations that drive efficiency, security, and savings

By modernizing Microsoft Treasury Services’ network infrastructure—through migrating Aladdin to Azure and transitioning to SWIFT Alliance’s SaaS platform—the teams’ collaborative efforts achieved clear, measurable success

These initiatives boosted operational efficiency, strengthened security, and unlocked greater flexibility, all while bringing significantly reduced costs:

  • Substantial cost savings: Over $1 million saved by eliminating the need for new network hardware and licenses.
  • Enhanced operational continuity: Azure’s dynamic failover eliminated outages caused by power surges or link failures.
  • Remote accessibility: Employees no longer need to be physically present in the Trading Room, with secure VPN access enabling global remote work.
  • Greater scalability and agility: Treasury services can now scale in real time to meet evolving partner demands.
  • Lower partner costs: Key financial partners like BlackRock were able to terminate expensive contracts for on-premises circuits, realizing further savings.
  • Lower environmental footprint: A smaller data center footprint reduced energy consumption and maintenance overhead.

By using Azure’s powerful capabilities, Treasury Services is well-prepared to navigate the complexities of today’s financial landscape, ensuring resilience and agility in a rapidly evolving, dynamic environment.

Looking ahead

The modernization of Microsoft Treasury’s network infrastructure is a powerful example of what digital transformation can achieve. While the immediate gains—cost savings, improved reliability, and increased efficiency—were substantial, the true value lies in what this transformation made possible.

“The transition to a cloud-based network using Azure has empowered the Treasury team with the ability to scale efficiently in response to partner-related changes or enhancements, thanks to being fully hosted in the cloud.”

Justin Griffin, principal group network engineering manager, Microsoft Digital

By migrating to Azure and retiring legacy systems, the Treasury Services group, in partnership with the Microsoft Digital team, is now equipped to navigate the evolving financial landscape with greater agility, resilience, and confidence. The project not only addressed technical debt but also laid the groundwork for future innovation.

With a fully cloud-hosted treasury network, Treasury Services can more easily onboard new financial services and partners, scale operations on demand, and take full advantage of Azure’s built-in monitoring and security tools.

“The transition to a cloud-based network using Azure has empowered the Treasury team with the ability to scale efficiently in response to partner-related changes or enhancements, thanks to being fully hosted in the cloud,” Griffin says. “My team can now seamlessly adjust the Azure cloud network infrastructure to meet the Treasury team’s evolving demands and business needs.”

This success story also illustrates the impact of strategic collaboration, deliberate planning, and cutting-edge technology. It proves that even the most complex, deeply embedded financial systems—ones that move hundreds of billions of dollars—can be reinvented. What began as a high-stakes infrastructure challenge has become a model for future transformation.

Microsoft Treasury’s network infrastructure modernization isn’t just a technical achievement; it’s a blueprint for how organizations can evolve. The ultimate goal is a world where eliminating the legacy burden, embracing the cloud, and meeting high standards for speed, security, and scalability is the norm, not the exception.

Key takeaways

Here are some of our top insights from moving Microsoft Treasury Services network infrastructure to Azure:

  • Embrace cloud migration as an achievable goal: Microsoft Treasury Services, in partnership with the Microsoft Digital team, overcame a significant IT challenge by transitioning from an on-premises system to a cloud-based network using Azure.
  • Untangle complexity: Microsoft Treasury Services Azure, in partnership with the Microsoft Digital team, eliminated the need for on-premises hardware, significantly reducing system complexity and network maintenance requirements.
  • Creating an adaptable partner ecosystem: In an environment where partners and providers increasingly operate in the cloud, the transition bolstered service continuity for critical financial functions and enabled remote access to financial services.
  • Modernization saves time and money: The modernization resulted in substantial cost savings exceeding $1 million, and annual savings of approximately 200 hours in management time.
  • Embrace migration challenges as opportunities: Microsoft Treasury Services looks forward to using Azure’s robust infrastructure to boost agility, cut costs, and fuel future innovation. Each opportunity to upgrade is a chance to innovate.

The post The $500-billion challenge: Inside the modernization of Microsoft Treasury’s backend infrastructure appeared first on Inside Track Blog.

]]>
19379
Reimagining content creation with our Azure AI-powered Inside Track story bot http://approjects.co.za/?big=insidetrack/blog/reimagining-content-creation-with-our-azure-ai-powered-inside-track-story-bot/ Thu, 20 Feb 2025 17:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=18464 In Microsoft Digital, the company’s IT organization, innovation is the fuel that drives us. Engage with our experts! Customers or Microsoft account team representatives from Fortune 500 companies are welcome to request a virtual engagement on this topic with experts from our Microsoft Digital team. To stay ahead, we are continuously exploring ways to use […]

The post Reimagining content creation with our Azure AI-powered Inside Track story bot appeared first on Inside Track Blog.

]]>
In Microsoft Digital, the company’s IT organization, innovation is the fuel that drives us.

To stay ahead, we are continuously exploring ways to use technology to drive efficiency and value. One way we’re innovating is by using generative AI to accelerate content development for our Inside Track blog.

While there will always be humans in the loop, using generative AI, we have embarked on a journey to revolutionize how we engage with our customers through compelling stories and case studies. In fact, this very article was created with the help of Artificial Intelligence.

The vision and challenge

A composite photo of Jones, Pydimarri, Sengar, Boyd, and Velush.
The Microsoft Digital internal Azure AI bot team works together to constantly improve the AI-powered Inside Track story bot. The team includes (left to right) Dwight Jones, Revanth Chandra Pydimarri, Urvi Sengar, Keith Boyd, and Lukas Velush.

The vision for this project was clear: use AI to accelerate content creation, reduce turnaround times, and enable any member of our Microsoft Digital team to quickly share their story with customers. However, the path to realizing this vision was not without its challenges.

“We were venturing into uncharted territory, using AI in its infancy with a limited budget and few resources,” says Dwight D. Jones Sr., a principal product manager on the Frictionless Device team who led the initiative.

To overcome these challenges, the team created a pitch deck based on their vision, built a detailed technical specification, and worked with stakeholders to build support for the project. They formed a virtual team across the Microsoft Digital Inside Track team, the Frictionless Devices team, and the Employee Productivity Engineering team.

With the approval of Microsoft Digital leadership, they launched a small proof of concept that proved successful—the Inside Track content bot. In the pilot, the team was able to use an AI-powered interview bot to produce excellent first drafts of blog posts, saving significant time per story. 

Accelerating publication and reducing costs 

Senior Director Keith Boyd emphasized the dual challenge of telling more stories at a lower overall cost and accelerating publication.

“The key technology that’s powering the bot is OpenAI on Azure,” Boyd says. “It’s already helping us, by making it easier for subject matter experts in Microsoft Digital to share their expertise on their own schedule.” 

By using the bot, the team hopes to increase the number of stories in the pipeline and decrease the time it takes to reach publication.

“Our primary metric is time to publication,” Boyd says. “Can the bot help us move from a six-week cycle for story production to end-to-end story authoring and publication in half that time?” 

Watch this demo of our Inside Track content bot. Customers or Microsoft account team representatives from Fortune 500 companies are welcome to request a virtual engagement on this topic with experts from our Microsoft Digital team. 

Streamlining the process 

Revanth Chandra Pydimarri, a senior product manager on the Inside Track content bot virtual team, highlighted the need to streamline the drafting and publishing process.

“We were wasting resources through inefficient operational processes, and it took a lot of time for us to reach out to relevant subject matter experts, get the content from them, write the story, and eventually publish it,” Pydimarri says. 

The team turned to Microsoft Azure AI, building a bot that uses generative AI capabilities—including Retrieval-augmented generation (RAG) and Azure OpenAI—on your data to interview subject matter experts (SMEs) at their convenience.

“We have trained the bot in the style of Inside Track,” Pydimarri says.

The results were impressive, with a time savings of at least five hours per story, while maintaining publication standards.

Capturing input and maintaining style 

Urvi Sengar focused on training the bot how to write stories with the same look and feel of human-written stories, and on how to use the bot to capture input from SMEs and turn it into content. The bot was trained using a comprehensive dataset of previously published stories and with writing guidance provided by the editorial team, enabling it to emulate the Inside Track team’s storytelling style.

 “The bot’s sophisticated content summarization and curation techniques allow it to draft high-quality stories quickly, maintaining the nuanced details and tone provided by the SMEs,” says Sengar, a senior software engineer in Microsoft Digital. 

Despite the challenges of transferring subject matter expertise and iteratively improving the process based on user feedback, the project has been well received by stakeholders.

“The bot is now able to generate drafts that are more than 70% complete, significantly reducing the time required to produce stories,” Sengar says.

The team is adding the ability for the bot to verbally interview SMEs by integrating VoiceRAG using Azure AI Search for audio to text capabilities. This will enable the bot to transcribe spoken interviews accurately and efficiently, streamlining the content collection phase and creating richer, multi-dimensional stories through a more natural interviewing process.

Combining multiple perspectives 

Lukas Velush, the managing editor of Inside Track, tackled the challenge of using an AI-powered bot to interview multiple SMEs and combine their interviews into one coherent story.

“Our challenge was to continue telling the stories of our subject matter experts across Microsoft Digital, despite having a smaller team and a shrinking budget,” Velush says.

This downward pressure on resources was the inspiration for the bot, which could help create stories at a lower cost and at a faster pace. 

The bot is designed to emulate human writers, interviewing SMEs about their IT work, their challenges, and how they overcame them.

“One of the biggest challenges that we’re still working on is getting the bot to be able to interview multiple people for a story then weave what each person has to say into a coherent narrative,” Velush says.

The current solution is having the bot interview each person separately, then combine the resulting stories into one comprehensive narrative. 

The initial results are promising, with bot-written stories costing 50% less than human-written stories and being completed at least 30% faster.

“This has been an amazing journey—we’ve learned a ton about AI, about humans, and about what it takes to report and write stories like this,” Velush says. “I see the bot as a great helper that can work right alongside our human writers.” 

The future of AI-driven content creation 

As Microsoft Digital continues to innovate and push boundaries, the Inside Track bot stands as a testament to the power of AI to transform the way we tell our stories. Using AI, the team has shown that it can accelerate the content creation process, enabling delivery of high-quality, engaging content faster and more efficiently than before. 

Looking ahead, the team plans to further enhance the bot’s capabilities. Jones shared that the team plans to use AI to produce multiple types of content with a single entry, such as articles, blog posts, PowerPoint decks, and white papers. They also plan to generate content in multiple languages and to support readers with accessibility needs. 

Boyd emphasized the potential for AI to augment human capabilities.

“One goal is to reassure other content teams that AI-powered bots are not here to replace them, but to augment their capabilities and make them more productive,” he says.

This is just the start.

“This is V1,” Pydimarri says. “We have plans to add more features and also expand the functionality to accommodate other team’s requirements, too.” 

Future enhancements will focus on expanding the bot’s capabilities and functionality. “That will ensure it continues to meet the evolving needs of the Inside Track team and other users,” Sengar says.

Looking to the future

Using AI to accelerate content creation in Microsoft Digital exemplifies the team’s innovative spirit and commitment to customer engagement.

“This project is just the beginning,” Jones says. “We’re excited about the potential of AI to revolutionize how we engage with our customers and look forward to seeing where this journey takes us.” 

The Inside Track bot has demonstrated the potential of AI to streamline processes, improve efficiency, and deliver high-quality, engaging stories that resonate with the audience. It’s a great example of digital transformation in action, paving the way for future innovations. 

As we continue to push the boundaries of what’s possible with AI, we invite you to join us on this journey. Explore the capabilities of the tools and technologies available through Microsoft Azure AI in your own content creation processes and see firsthand how it can enhance productivity and storytelling quality. 

Key Takeaways

We’re using AI to accelerate content generation at Microsoft, and you can too. Here are some highlights from our journey:

  • AI-powered content creation: Microsoft Digital is using generative AI to automate content creation for the Inside Track blog, significantly reducing turnaround times.
  • Successful pilot: The AI-powered chatbot proved successful in a pilot, producing excellent first drafts of blog posts and saving significant time per story.
  • Efficiency and cost reduction: The initiative aims to tell more stories at a lower overall cost and accelerate publication, with Inside Track content bot-written stories costing 50% less and being completed at least 30% faster.
  • Streamlined process: The AI bot interviews subject matter experts (SMEs) at their convenience, eliminating the need for coordinating schedules and saving at least five hours per story.
  • Maintaining quality: The bot is trained to emulate the Inside Track team’s storytelling style, capturing nuanced details and tone provided by SMEs.
  • Adding voice: We’re integrating Azure AI Speech Studio for audio-to-text capabilities, which will make getting interviewed by the Inside Track content bot a more natural experience for our SMEs.
  • Future enhancements: We’re expanding the bot’s functionality to help other Microsoft Digital teams.

The post Reimagining content creation with our Azure AI-powered Inside Track story bot appeared first on Inside Track Blog.

]]>
18464
Migrating from Microsoft Monitoring Agent to Azure Arc and Azure Update Manager at Microsoft http://approjects.co.za/?big=insidetrack/blog/migrating-from-microsoft-monitoring-agent-to-azure-arc-and-azure-update-manager-at-microsoft/ Thu, 26 Sep 2024 16:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=16574 As organizations grow and transform their IT infrastructures, maintaining consistency in patch management across various environments and cloud architectures has become a priority here at Microsoft and at companies elsewhere. A recent shift from Microsoft Monitoring Agent (MMA) to Microsoft Azure Arc and Microsoft Azure Update Manager (AUM) offers us and others a unified solution […]

The post Migrating from Microsoft Monitoring Agent to Azure Arc and Azure Update Manager at Microsoft appeared first on Inside Track Blog.

]]>
Microsoft Digital stories

As organizations grow and transform their IT infrastructures, maintaining consistency in patch management across various environments and cloud architectures has become a priority here at Microsoft and at companies elsewhere.

A recent shift from Microsoft Monitoring Agent (MMA) to Microsoft Azure Arc and Microsoft Azure Update Manager (AUM) offers us and others a unified solution for both on-premises and cloud resources. This transition is improving our patch orchestration while offering our IT leaders more robust control of our diverse systems internally here in Microsoft Digital, the company’s IT organization.  

Moving to Azure Arc

Granata and Arias appear together in a composite image.
Transitioning from Microsoft Monitoring Agent to Azure Arc ensures streamlined updates across diverse systems, say Cory Granata (left) and Humberto Arias. Granata is a senior site reliability engineer on the Microsoft Digital Security and Compliance team and Arias is a senior product manager in Microsoft Digital.

Using MMA and shifting to AUM with Microsoft Azure Arc integration requires using Azure Arc as a bridge, enabling management of both on-premises and cloud-based resources under a single source.

Historically, the MMA allowed for “dual homing,” where IT teams could connect machines to multiple Microsoft Azure subscriptions with ease. This flexibility streamlined patch management and reporting across different environments.

This feature is particularly useful for us and other large organizations with multiple Azure environments, says Cory Granata, a senior site reliability engineer on the Microsoft Digital Security and Compliance team in Microsoft Digital. However, the newer Azure Arc-based AUM only allows machines to report into one subscription and resource group at a time.

This limitation required some coaching for teams accustomed to MMA’s dual-homing capabilities.

“It wasn’t really an issue or a challenge—just coaching and getting other teams in the mindset that this is how the product was developed,” Granata says.

Azure Arc’s streamlined approach offers an efficient path for IT teams like ours looking to centralize patch management, especially for diverse infrastructures that include cloud and on-premises assets.

Centralizing patch orchestration

One of the standout advantages of Azure Update Manager with Azure Arc is its ability to support patch orchestration across a wide range of environments.

“You have the ability to patch on-premises, off-premises, Azure IaaS, and other resources,” Granata says. “This flexibility extends beyond Azure to cover machines hosted on other platforms, and on-premises Hyper-V servers.”

For organizations with complex infrastructures like ours, this unified approach simplifies operations, reducing the need for multiple tools and platforms to handle updates. Whether managing physical servers in data centers, virtual machines across different cloud providers, or edge computing devices, Azure Arc ensures that patch management is consistent and reliable.

These changes have been very helpful internally here at Microsoft.

“The AUM is our one-stop solution for patching all these different inventories of devices, regardless of where they reside—on-premises, in the cloud, or in hybrid environments,” says Humberto Arias, a senior product manager in Microsoft Digital.

This multi-cloud and edge computing capability offers IT leaders here and elsewhere the flexibility to scale their patch management efforts without being tied to a specific platform.

Migration challenges

While the transition to Azure Arc and AUM has brought us significant benefits, there have been some challenges, particularly around managing expectations for dual-homing capabilities.

The key thing we had to work through was that Azure Arc could only connect to one Azure subscription and resource group at a time. This required additional training for us—we needed to shift our mindset and adopt new workflows. However, after our people understood this limitation, the migration process was smooth.

“Fortunately, it only phones into one subscription and one resource group,” Granata says. “So, wherever it phones in is where all of your patch orchestration logs and everything must go as well, and it can’t connect into another subscription. This centralized approach simplifies reporting and patch management, but it did require some initial adjustments for teams accustomed to multi-subscription environments.”

Through coaching and training, our teams were able to adapt, and the long-term benefits of a more streamlined system quickly became apparent.

Azure Arc and AUM benefits

Following our migration, our teams began to realize the true benefits of using Azure Arc and AUM for their patch orchestration needs.

“The neat thing about using AUM with patch management and patch orchestration is the centralized control it provides,” Granata says.

For IT teams managing both internal IT assets and lab environments, the ability to oversee patching across a diverse range of systems from one location was a game-changer.

Additionally, the new system provided enhanced reporting and visibility.

While MMA offered flexibility in terms of connecting to multiple subscriptions, Azure Arc’s centralized model makes it easier to manage logs, reports, and patch statuses from a single dashboard.

“We’ve really enjoyed the increased visibility and ease of use that this has given us,” Arias says. “This is particularly valuable for large organizations like ours with distributed environments, where maintaining visibility across multiple systems can be a challenge.”

The integration with Azure Arc also extends your platform’s reach to non-Azure environments, including AWS and other cloud providers. This means that organizations running multi-cloud or hybrid cloud strategies can benefit from a unified patch management system, regardless of where their machines are hosted.

For IT leaders here and elsewhere, these improvements represent a significant step forward in our operational efficiency and security. By centralizing patch management under Azure Arc and AUM, we can ensure that our systems are up-to-date, secure, and compliant, without the need for multiple tools or platforms. We hope sharing our story helps you do the same at your company.

Key Takeaways

Here are some tips for getting started at your company:

  • Azure Arc allows for a centralized management approach, providing IT leaders with a comprehensive view of their infrastructure.
  • Azure Update Manager offers improved patch orchestration and update management, leveraging the latest Azure technologies.
  • While the transition to Azure Arc brings numerous benefits, it also necessitates adjustments, particularly for teams accustomed to dual homing with the Microsoft Monitoring Agent.
  • With some light coaching, teams can easily learn the new system’s capabilities and limitations.
Try it out

Discover more about Azure Arc from the Microsoft Azure product group, including About Azure Arc, Azure Arc for servers, and the Azure Cloud Adoption Framework.

The post Migrating from Microsoft Monitoring Agent to Azure Arc and Azure Update Manager at Microsoft appeared first on Inside Track Blog.

]]>
16574
Running our customer service and support contact centers on Microsoft Azure http://approjects.co.za/?big=insidetrack/blog/running-our-customer-service-and-support-contact-centers-on-microsoft-azure/ Thu, 11 Jul 2024 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=15298 Providing exemplary support is critical to how we empower our customers to achieve more with Microsoft technologies and services. We in Microsoft Digital, the company’s IT organization, recently migrated our global Microsoft customer service network to Microsoft Azure, creating a cloud network-based solution to connect our customers to the support services they need at Microsoft. […]

The post Running our customer service and support contact centers on Microsoft Azure appeared first on Inside Track Blog.

]]>
Providing exemplary support is critical to how we empower our customers to achieve more with Microsoft technologies and services.

Microsoft Digital technical stories

We in Microsoft Digital, the company’s IT organization, recently migrated our global Microsoft customer service network to Microsoft Azure, creating a cloud network-based solution to connect our customers to the support services they need at Microsoft. With the new solution, our customers and customer service team members are connected faster, more reliably, and with improved network performance while maintaining secure and compliant connections.

Building a better customer support network

Scheffler and McNeill are shown in a composite photo.
Eric Scheffler and Elaine McNeill are part of the team at Microsoft that has moved our contact center network infrastructure to Microsoft Azure.

Our Support Experience Group (SxG) within our Cloud + AI division is driving transformation for Microsoft Support solutions, building on Microsoft solutions and infusing cutting-edge innovation to improve customer and agent experiences across all our businesses. Our SxG team provides platforms and services to almost 80,000 Microsoft support advocates, including technical teams and customer support advocates from our network of global contact centers.

Our customer support advocates and partners are integral to maintaining high-quality customer service and support for Microsoft products and solutions. Microsoft customer services handle almost 200,000 support calls daily in 37 different languages worldwide. It’s a diverse and fast-paced environment where connecting support staff to the customer and the Microsoft services they support can be complex.

Our previous global network backbone served us for years through the deployment of key regional central hub sites. Hub sites were connected by physical point-to-point Multiprotocol Label Switching (MPLS) circuits deployed strategically to various sites globally. The MPLS network design is complex, costly, and inflexible.

By redesigning our network with Microsoft Azure Cloud Network solutions at the center, we’re addressing several challenges associated with traditional MPLS networks, such as:

  • Cost and complexity: MPLS networks are often expensive and complex to deploy. 
  • Inflexibility: MPLS is designed for stable, point-to-point connections and can be too rigid for the dynamic and distributed nature of modern cloud computing. It struggles to efficiently handle the traffic patterns created by enterprises running workloads across multiple clouds.
  • Deployment speed: Setting up or modifying MPLS connections can take weeks or even months, which is not conducive to the agility required by businesses today. Cloud networks can be deployed and scaled much more rapidly.
  • Security and encryption: Traditional MPLS doesn’t offer encryption, which is increasingly important as operations move toward the cloud. A cloud network can provide consistent protection regardless of how users connect.

At the core of our transformation is a newly designed global, cloud-based network built on Azure Virtual WAN services called the SxG Cloud Network, built specifically for Microsoft customer services. The SxG Cloud Network directly connects advocates at Microsoft contact centers, remote advocates and internal support teams to the required services.

The SxG Cloud Network provides a highly reliable and high-performing network path into Azure, where support team members can access the tools and environments required to support our customers fully. Within the network, our customer service teams are connected to Azure Virtual Desktops that supply the tools and connectivity they need for troubleshooting, enabling them to connect with Microsoft customers worldwide through virtual private network (VPN) and Azure Virtual Network (VNet) peering.

The SxG Cloud Network resides on the Microsoft Azure tenant and consists of several virtual WAN hubs in key Azure regions across the globe. These hubs use Microsoft Azure Firewall to secure traffic flows within the cloud network using URL filtering, TLS inspection, and intrusion detection and prevention.

The Azure-based hubs provide a single access point that simplifies connectivity and creates a unified and consistent environment for all support advocates. We provide several connectivity methods for our Microsoft customer support advocates irrespective of location, including:

  • Point-to-site (P2S) VPN: This provides connectivity for the remote user working from home.
  • Site-to-site (S2S) VPN: We use S2S VPN to connect Microsoft contact centers using an S2S encrypted tunnel between the partner VPN concentrator and the SxG Cloud Network gateway.
  • VNet peering: We also support peering between a partner Azure tenant and the SxG Cloud Network Azure tenant. VNets on both tenants are directly peered and secured by Azure Firewall.

Point-to-site VPN

Remote Microsoft customer support advocates use Azure P2S VPN to connect directly to Microsoft services in Azure. We maintain several VPN hubs across global Azure regions to ensure that advocates experience the most direct network path to Azure. We use Azure networking components within Azure to connect to the required internal Azure resources.

To ensure that only necessary traffic goes through the VPN, VPN profiles are configured with split-tunnel routing that routes Microsoft specific traffic to Azure and the rest to the partner network or the public internet. This ensures that users can access local websites in the correct locale and languages they need, while also enabling low-latency access to the Microsoft corporate edge network.

The Azure VPN client facilitates connectivity between the local device and the Azure Virtual WAN gateway hosted in the SxG network. We use a single VPN profile configured with split tunneling for all VPN users. This is made possible by a key feature of Azure Virtual WAN that automatically connects P2S users directly to the closest region. Authentication is required to access the VPN and users authenticate using their Microsoft credentials through Entra ID and multi-factor authentication.

Site-to-site VPN

S2S VPN connections provide a secure encrypted VPN connection over the public internet to connect our contact centers to Microsoft customer support services in Azure. The contact center partner manages their network and the configuration of the device on their network, which establishes a VPN tunnel to the Azure Virtual WAN gateway hosted in the SxG Cloud Network.

VNet peering

When partners already have an Azure presence, Microsoft can connect the partner Azure network to the virtual WAN using Azure VNet peering. Traffic between the peered VNets doesn’t leave the global Azure backbone network. We use SxG VNet peering to connect VNets in the Microsoft tenant with VNets in the partner’s Azure tenant. VNet peering establishes a high-performance, trusted connection using Azure Firewall in the SxG Cloud Network to provide flow control and traffic protection.

SxG Cloud Network infrastructure

Graphic showing an architecture diagram of the SxG Cloud Network.
An architecture diagram of the SxG Cloud Network.

Managing connectivity for voice services

Our advocates often support our customers with voice calls, and supporting an effective and efficient voice service is integral to the SxG Cloud Network.

We use Azure ExpressRoute connections to create a direct private network path from all our Azure Virtual WAN gateways to our voice services platform environment using an MPLS backbone. These global connections to our voice services hosted in Azure enable advocates connected to the SxG Cloud Network via P2S, S2S, or VNet peering to use our voice services. The Interhub feature in Azure Virtual WAN also provides seamless connectivity between hubs, ensuring that user network traffic takes the best path with minimal latency while traversing the Microsoft backbone network.

Microsoft customer service advocates voice services are now migrated to Azure Communication Services, which is connected to the SxG Cloud Network with ExpressRoute and keeps traffic on the reliable Azure backbone network.

The SxG Cloud Network has modernized how we connect to voice and data services hosted in Azure and can provide advocates access without needing to deploy physical circuits to contact center locations, saving time and money. It also creates a unified network environment, simplifying access points and functionality for our advocates.

With the flexibility and scalability of the SxG Cloud Network, we can manage our bandwidth needs better and have fewer physical circuits that are oversized for the traffic volume. This alone is reducing network costs by more than 60% in specific cases. While exact figures for cost savings and performance improvements can vary depending on the specific circumstances of a deployment, businesses often report significant reductions in total cost of ownership (TCO) and enhancements in network performance when migrating from MPLS to Azure cloud-based solutions.

Looking forward

As we look to the immediate future of the SxG Cloud Network, we’re excited about increasing Azure Communication Services traffic on our network for voice support, further unifying our services and leading to more significant cost savings and efficiency. We’ll continue searching for ways to improve the SxG Cloud Network, including moving the network edge closer to our users with new global virtual WAN hubs. This helps us deliver more effective and easy-to-use support services for Microsoft customers and the advocates who support them.

Key Takeaways

We’re benefiting from the SxG Cloud Network in several areas, including:

  • Experience enhanced support: Connect faster and more reliably to support services thanks to our migration to the Azure-based SxG Cloud Network, ensuring high-quality assistance whenever Microsoft customers need it.
  • Global reach, local service: The SxG Cloud Network spans countries and languages, providing a seamless support experience through a diverse team of professionals ready to assist customers.
  • Secure and simplified connectivity: Azure Virtual WAN offers various connection options, including VPN and VNet, to ensure a secure, direct connection to support resources.
  • Future-ready voice services: Azure Communication Services is creating a more integrated and cost-effective voice support system, enhancing the support experience while maintaining the highest network reliability standards.

The post Running our customer service and support contact centers on Microsoft Azure appeared first on Inside Track Blog.

]]>
15298
Boosting employee connectivity with Microsoft Azure-based VWAN architecture http://approjects.co.za/?big=insidetrack/blog/boosting-employee-connectivity-with-microsoft-azure-based-vwan-architecture/ Fri, 27 Oct 2023 00:01:03 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=12238 Editor’s note: This is the fourth in our ongoing series on moving our network to the cloud internally at Microsoft. Whether our employees are in neighboring cities or different continents, they need to communicate and collaborate efficiently with each other. We designed our Microsoft Azure-based virtual wide-area network (VWAN) architecture to provide high-performance networking across […]

The post Boosting employee connectivity with Microsoft Azure-based VWAN architecture appeared first on Inside Track Blog.

]]>
Microsoft Digital storiesEditor’s note: This is the fourth in our ongoing series on moving our network to the cloud internally at Microsoft.

Whether our employees are in neighboring cities or different continents, they need to communicate and collaborate efficiently with each other. We designed our Microsoft Azure-based virtual wide-area network (VWAN) architecture to provide high-performance networking across our global presence, enabling reliable and security-focused connectivity for all Microsoft employees, wherever they are.

We’re using Azure to strategically position enterprise services such as the campus internet edge in closer proximity to end users and improve network performance. These performance improvements are streamlining our site connectivity worldwide and improving the user experience, increasing user satisfaction and operational efficiency.

We’ve recently piloted this VWAN architecture with our Microsoft Johannesburg office. Our users in Johannesburg were experiencing latency issues and sub-optimal network performance relating to outbound internet connections routed through London and Dublin in Europe. In other words, employees had to go to another continent in order to reach the internet.

To simplify the network path for outgoing internet traffic and reduce latency, we migrated outbound traffic for two network segments in Johannesburg to the Azure Edge using a VWAN connected through Azure ExpressRoute circuits.

The solution relocates the internet edge for Johannesburg to the South Africa North region datacenter in South Africa, using Azure Firewall, Azure ExpressRoute, Azure Connection Monitor, and Azure VWAN. We’ve also evolved our DNS resolution strategy to a hybrid solution that hosts DNS services in Azure, which increases our scalability and resiliency on DNS resolution services for Johannesburg users. We’ve deployed the entire solution adhering to our infrastructure as code strategy, creating a flexible network infrastructure that can adapt and scale to evolving demands on the VWAN.

We’re using Azure Network Watcher connection monitor and Broadcom AppNeta to monitor the entire solution end-to-end. These tools will be critical in evaluating the VWAN’s performance, enabling data-driven decisions for optimizing network performance.

The accompanying high-level diagram outlines our updated network flows. We can support distinct user groups by isolating the guest virtual route forwarding zone (red lines) and the internet virtual route forwarding zone (black lines). This design underscores our commitment to robust outbound traffic control, ensuring a secure and optimized network environment.

Traffic from the Johannesburg office is routed to the internet through the Azure-based VWAN.
Creating efficient and isolated traffic routing to the internet with Azure-based VWAN architecture.

Beth Garrison smiles at a desk with a laptop computer.
Beth Garrison is a cloud software engineer and part of the team that is helping build and maintain Microsoft Digital’s network using infrastructure as code.

We strongly believe our VWAN-based architecture represents the future of global connectivity. The agility, scalability, and resiliency of VWAN infrastructure enables increased collaboration, productivity, and efficiency across our regional offices.

Our pilot in Johannesburg proved that improvements in network performance directly affected user experience. By relocating the network edge to the South Africa region in Azure instead of our datacenter edge in London/Dublin, latency for connections from Johannesburg to other public endpoints in South Africa has dropped from 170 milliseconds to 1.3 milliseconds.

Latency for other network paths has also improved, but by lesser amounts depending on the specific destination. The improvements were always greater the closer the destination was to Johannesburg, including connectivity paths to the United States and Europe, demonstrating stability and reliability in these critical connections. Significant benefits of the VWAN solution include:

  • Increased scalability and flexibility. Our architecture is built to scale with our business needs. Whether we have a handful of regional buildings or a continent, the VWAN solution can accommodate any dynamic growth pattern. As our service offering expands, we can easily add new locations and integrate them seamlessly into the VWAN infrastructure.
  • Greater network resilience. Continuous connectivity is essential to effective productivity and collaboration. Our architecture incorporates redundancy and failover mechanisms to ensure network resilience. In case of a network disruption or hardware failure, the VWAN solution automatically reroutes traffic to alternative paths, minimizing downtime and maintaining uninterrupted communication.
  • Improved security and compliance. Protecting our data and ensuring compliance is our top priority. Our VWAN-based architecture is secure by design that incorporates industry-leading security measures, including encryption, network segmentation, and access controls. We adhere to the highest security standards that help Microsoft safeguard sensitive information in transit and meet compliance requirements.

We’re currently planning our VWAN-based architecture to span multiple global regions, offering extensive coverage and enabling our employees to connect to their regional and global services through the Azure network backbone as we continue prioritizing network performance to deliver exceptional connectivity for voice, data, and other critical applications.

We’re working to build improvements into the architecture for more optimized routing, improved Quality of Service (QoS) mechanisms, and advanced traffic management techniques to minimize latency, packet loss, and jitter, ensuring robust and low-latency connections to facilitate seamless communication regardless of where our employees are located.

Contact us today to explore how our cutting-edge VWAN-based architecture can transform your organization’s networking capabilities and revolutionize how your employees connect and communicate globally. Email us and include a link to this story and we’ll get back to you with more information.

Key Takeaways

  • Assess your organization’s current network performance and needs to understand the challenges remote employees and satellite offices face regarding latency and connectivity.
  • Incorporate Microsoft Azure for improved scalability, flexibility, and resilience so you can strategically position cloud services near end users, improving latency and overall user experience.
  • Adopt an infrastructure-as-code approach to deploy flexible virtual network infrastructures. This streamlines the deployment process and ensures adaptability to ever-changing network demands.
  • Invest in monitoring tools to gain valuable insights into the VWAN’s performance, which will help you make data-driven decisions for optimization.
  • Adopt a VWAN-based architecture that emphasizes security measures such as encryption, network segmentation, and strict access controls. Ensure that the architecture adheres to the highest security standards, safeguarding sensitive information and meeting compliance requirements.
  • Keep updated on advancements in network routing, Quality of Service mechanisms, and traffic management techniques. This will help you minimize latency and ensure robust, low-latency connections, enhancing global communication for your employees.

Try it out
Get started at our company by learning how to deploy Azure VWAN with routing intent and routing policies.

Related links

We'd like to hear from you!
Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Boosting employee connectivity with Microsoft Azure-based VWAN architecture appeared first on Inside Track Blog.

]]>
12238
How we’re deploying our VWAN infrastructure using infrastructure as code and CI/CD http://approjects.co.za/?big=insidetrack/blog/how-were-deploying-our-vwan-infrastructure-using-infrastructure-as-code-and-ci-cd/ Fri, 22 Sep 2023 20:48:18 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=12202 Editor’s note: This is the first in an ongoing series on moving our network to the cloud internally at Microsoft. We’re building a more agile, resilient, and stable virtual wide-area network (VWAN) to create a better experience for our employees to connect and collaborate globally. By implementing a continuous integration/continuous deployment (CI/CD) approach to building […]

The post How we’re deploying our VWAN infrastructure using infrastructure as code and CI/CD appeared first on Inside Track Blog.

]]>
Microsoft Digital storiesEditor’s note: This is the first in an ongoing series on moving our network to the cloud internally at Microsoft.

We’re building a more agile, resilient, and stable virtual wide-area network (VWAN) to create a better experience for our employees to connect and collaborate globally. By implementing a continuous integration/continuous deployment (CI/CD) approach to building our VWAN-based network infrastructure, we can automate the deployment and configuration processes to ensure rapid and reliable delivery of network changes. Here’s how we’re making that happen internally at Microsoft.

Infrastructure as code (IaC)

Jimenez and Scheffler smile in corporate photos that have been merged into a composite image.
Juan Jimenez (left) and Eric Scheffler are part of the team in Microsoft Digital that is helping the company move its network to the cloud. Jimenez is a principle cloud network engineer and Scheffler is a senior cloud network engineer.

Infrastructure as code (IaC) is the fundamental principle underlying our entire VWAN infrastructure. Using IaC, we can develop and implement a descriptive model that defines and deploys VWAN components and determines how the components work together. IaC allows us to create and manage a massive network infrastructure with reusable, flexible, and rapid code deployments.

We created deployment templates and resource modules using the Bicep language in our implementation. These templates and modules describe the desired state of our VWAN infrastructure in a declarative manner. Bicep is a domain-specific language (DSL) that uses declarative syntax to deploy Microsoft Azure resources.

We maintain a primary Bicep template that calls separate modules—also maintained in Bicep templates—to create the desired resources for the deployment in alignment with Microsoft best practices. We use this modular approach to apply different deployment patterns to accommodate changes or new requirements.

With IaC, changes and redeployments are as quick as modifying templates and calling the associated modules. Additionally, parameters for each unique deployment are maintained in separate files from the templates so that different iterations of the same deployment pattern can be deployed without changing the source Bicep code.

Version control

We use Microsoft Azure DevOps, a source control system using Git, to track and manage our IaC templates, modules, and associated parameter files. With Azure DevOps, we can maintain a history of changes, collaborate within teams, and easily roll back to previous versions if necessary.

We’re also using pull requests to help track change ownership. Azure DevOps tracks changes and associates them with the engineer who made the change. Azure DevOps is a considerable help with several other version control tasks, such as requiring peer reviews and approvals before code is committed to the main branch. Our code artifacts are published to (and consumed from) a Microsoft Azure Container Registry that allows role-based access control of modules. This enables version control throughout the module lifecycle, and it’s easy to share Azure Container Registry artifacts across multiple teams for collaboration.

Automated testing

Responsible deployment is essential with IaC when deploying a set of templates could radically alter critical network infrastructure. We’ve implemented safeguards and tests to validate the correctness and functionality of our code before deployment. These tests include executing the Bicep linter as part of the Azure DevOps deployment pipeline to ensure that all Bicep best practices are being followed and to find potential issues that could cause a deployment to fail.

We’re also running a test deployment to preview the proposed resource changes before the final deployment. As the process matures, we plan to integrate more testing, including network connectivity tests, security checks, performance benchmarks, and enterprise IP address management (IPAM) integration.

Configuration management

Azure DevOps and Bicep allow us to automate the configuration and provisioning of network objects and services within our VWAN infrastructure. These tools make it easy to define and enforce desired configurations and deployment patterns to ensure consistency across different network environments. Using separate parameter files, we can rapidly deploy new environments in minutes rather than hours without changing the deployment templates or signing in to the Microsoft Azure Portal.

Continuous deployment

The continuous integration (CI) pipeline automates the deployment process for our VWAN infrastructure when the infrastructure code passes all validation and tests. The CI pipeline triggers the deployment process automatically, which might involve deploying virtual machines, building and configuring cloud network objects, setting up VPN connections, or establishing network policies.

Monitoring and observability

We’ve implemented robust monitoring and observability practices for how we deploy and manage our VWAN deployment. Monitoring and observability are helping us to ensure that our CI builds are successful, detect issues promptly, and maintain the health of our development process. Here’s how we’re building monitoring and observability in our Azure DevOps CI pipeline:

  • We’re creating built-in dashboards and reports that visualize pipeline status and metrics such as build success rates, durations, and failure details.
  • We’re generating and storing logs and artifacts during builds.
  • We’ve enabled real-time notifications to help us monitor build status for failures and critical events.
  • We’re building-in pipeline monitoring review processes to identify areas for improvement including optimizing build times, reducing failures, and enhancing the stability of our pipeline.

We’re continuing to iterate and optimize our monitoring practices. We’ve created a feedback loop to review the results of our monitoring. This feedback provides the information we need to adjust build scripts, optimize dependencies, automate certain tasks, and further enhance our pipeline.

By implementing comprehensive monitoring and observability practices in our Azure DevOps CI pipeline, we can maintain a healthy development process, catch issues early, and continuously improve the quality of our code and builds.

Rollback and rollforward

We’ve built the ability to rollback or rollforward changes in case of any issues or unexpected outcomes. This is achieved through infrastructure snapshots, version-controlled configuration files, or using features provided by our IaC tool.

Improving through iteration

We’re continuously improving our VWAN infrastructure using information from monitoring data and user experience feedback. We’re also continually assessing new requirements, newly added Azure features, and operational insights. We iterate on our infrastructure code and configuration to enhance security, performance, and reliability.

By following these steps and using CI/CD practices, we can build, test, and deploy our VWAN network infrastructure in a controlled and automated manner, creating a better employee experience by ensuring faster delivery, increased stability, and more effortless scalability.

Key Takeaways
Here are some tips on how you can start tackling some of the same challenges at your company:

  • You can use Infrastructure as code (IaC) to create and manage a massive network infrastructure with reusable, flexible, and rapid code deployments.
  • Using IaC, you can make changes and redeployments quickly by modifying templates and calling the associated modules.
  • Don’t overlook version control. Tracking and managing IaC templates, modules, and associated parameter files is essential.
  • Perform automated testing. It’s necessary to validate the correctness and functionality of the code before deployment.
  • Use configuration management tools to simplify defining and enforcing desired configurations and deployment patterns. This ensures consistency across different network environments.
  • Implement continuous deployment to automate the deployment process for network infrastructure after the code passes all validation and tests.
  • Use monitoring and observability best practices to help identify issues, track performance, troubleshoot problems, and ensure the health and availability of the network infrastructure.
  • Building rollback and roll-forward capabilities enables you to quickly respond to issues or unexpected outcomes.

Try it out
Try using a Bicep template to manage your Microsoft Azure resources.

Related links

We'd like to hear from you!
Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post How we’re deploying our VWAN infrastructure using infrastructure as code and CI/CD appeared first on Inside Track Blog.

]]>
12202
Doing more with less: Optimizing shadow IT through Microsoft Azure best practices http://approjects.co.za/?big=insidetrack/blog/doing-more-with-less-optimizing-shadow-it-through-microsoft-azure-best-practices/ Wed, 07 Jun 2023 13:43:59 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=11267 You don’t know what you don’t know. In the world of IT, illuminating those hidden areas helps stave off nasty surprises. When elements of IT infrastructure are shrouded in mystery, it can lead to security vulnerabilities, non-compliance, and poor budget management. That’s the trouble with shadow IT—a term for any technical infrastructure that conventional IT […]

The post Doing more with less: Optimizing shadow IT through Microsoft Azure best practices appeared first on Inside Track Blog.

]]>
Microsoft Digital storiesYou don’t know what you don’t know. In the world of IT, illuminating those hidden areas helps stave off nasty surprises.

When elements of IT infrastructure are shrouded in mystery, it can lead to security vulnerabilities, non-compliance, and poor budget management. That’s the trouble with shadow IT—a term for any technical infrastructure that conventional IT teams and engineers don’t govern.

At Microsoft, we’re on a journey to increase our shadow IT maturity, resulting in fewer vulnerabilities and increased efficiencies. To get there, we’re leveraging tools and techniques we’ve developed through our core discipline of Microsoft Azure optimization.

[See how we’re doing more with less internally at Microsoft with Microsoft Azure. Learn how we’re transforming our internal Microsoft Azure spend forecasting.]

The challenges of shadow IT

Shadow IT is the set of applications, services, and infrastructure that teams develop and manage outside of defined company standards.

It typically crops up when engineering teams are unable to support their non-engineering partners. That situation may arise from a lack of available engineering capacity or the need for specialized domain solutions. On top of those circumstances, modern tools enable citizen developers to stand up low-code/no-code solutions that enable businesses to reduce their dependency on traditional engineering organizations.

Six corporate function teams have been involved in creating shadow IT environments: business development, legal, finance, human resources, and our consumer and commercial marketing and sales organizations.

Many of the solutions they’ve developed make strong business sense—as long as they’re secure and efficient. That’s where our Microsoft Digital (MSD) team comes in.

Three years ago, our biggest driver was getting visibility into the shadow IT estate and finding ways to secure it. Now we’re at a point where we’re looking for cost savings—that’s a natural progression.

—Myron Wan, principal product manager, Infrastructure and Engineering Services team

Over the last few years, our IT experts have been working with the shadow IT divisions to increase the maturity of the solutions they’ve developed, taking them from unsanctioned toolsets lurking in the shadows to well-governed, compliant, and secure assets they can safely use to advance our business goals.

The shadow IT journey leading from “unsanctioned” through “fundamentals,” “emerging,” “advanced,” and “optimized.”
Our journey toward shadow IT maturity has been steadily progressing through unsanctioned usage, building fundamentals, then emerging, advanced, and optimized maturity.

Now that these shadow IT solutions are more secure and compliant, we’ve turned our attention to efficiency and optimization to ensure we’re able to do as much as possible with the least necessary budget expenditure.

“Three years ago, our biggest driver was getting visibility into the shadow IT estate and finding ways to secure it,” says Myron Wan, principal product manager within the Infrastructure and Engineering Services (IES) team. “Now we’re at a point where we’re looking for cost savings—that’s a natural progression.”

Because many of our shadow IT solutions leverage Microsoft Azure subscriptions, that was a natural place to start the optimization work.

Azure best practices, shadow IT efficiency

Fortunately, we have robust discipline around optimizing Microsoft Azure spend in conventional IT and engineering settings. Microsoft Azure Advisor, available through the Microsoft Azure Portal, has been providing optimization recommendations and identifying overspend for subscribers both within Microsoft and in our customers’ organizations for years.

The plan was to take applicable recommendations that we use in our core engineering organizations and distribute them to the shadow IT divisions.

—Trey Morgan, principal product manager, MSD FinOps

Morgan poses for picture standing in front of a wall outside.
Trey Morgan is part of a cross-disciplinary technical and FinOps team helping optimize shadow IT at Microsoft.

Internally, we’ve added layers that help streamline the optimization process. One, called CloudFit, draws from a library of optimization recommendations, which are tailored to the specific needs of the teams we support. Then we use Service 360, our internal notification center that flags actions in need of addressing for our engineering teams, to route those recommendations to subscription owners within MSD, product groups, and business groups.

Optimization tickets then enter their queue and progress through open, active, and resolved statuses. It’s a standard method for creating and prioritizing engineering tasks, and Microsoft customers could accomplish a similar result by building a bridge between Microsoft Azure Advisor and their own ticketing tool, whether that’s Jira, ServiceNow, or others.

“We have an existing set of cost optimization recommendations that we use for a variety of different technologies like Azure Cosmos DB and SQL,” says Trey Morgan, principal product manager for MSD FinOps. “The plan was to take applicable recommendations that we use in our core engineering organizations and distribute them to the shadow IT divisions.”

Getting there was a matter of establishing visibility and building culture.

Shining a light on shadow IT spend

Many of the optimization issues within shadow IT divisions came about because of non-engineers’ and non-developers’ unfamiliarity or lack of training with subscription-based software. They might not have the background or expertise to set them up or even ensure that their subscriptions would terminate after they had served their purpose.

In some cases, vendors or contractors may have set up processes and then moved on once their engagement was complete. Each of these scenarios had the potential for suboptimal Azure spend.

Providing visibility into these issues was relatively simple. Because all Microsoft Azure subscriptions across our organization are searchable through our company-wide inventory management system and sortable by department, engineers were able to locate all the subscriptions belonging to shadow IT divisions. From there, they simply had to apply CloudFit recommendations to those subscriptions and loop them through Service 360.

Our people now have the information they need to act—our organizational leaders can visit their Service 360 dashboard or can review their action summary report to see what they can do to cut their costs. That’s where culture and education came into the equation.

“Culture is always the number-one challenge when items aren’t actually owned by a core engineering team,” Wan says. “When you have teams that are more about generating revenue or managing corporate processes, a lot of what we have to deal with is education.”

It wasn’t just educating teams about Microsoft Azure optimization techniques. CloudFit and Service 360 provided a lot of the guidance those teams would need to get the job done. To a great degree, non-engineering employees needed to build the discipline of receiving and resolving tickets like a developer or engineer would.

But through direct communications from FinOps tools and support from Wan’s colleagues in engineering, we’ve been meeting our goal of optimizing Azure spend in shadow IT divisions. In the first six months of this solution’s availability, we’ve saved $1 million thanks to various optimizations.

Microsoft Azure savings and organizational discipline

Shadow IT will always exist in some form or another, so this journey isn’t just about remedying past inefficiencies. It’s also about building a culture of optimization and best practices across shadow IT divisions as they use their Microsoft Azure subscriptions moving forward.

With these solutions and practices in place, we’ve moved on from a “get clean” and “stay clean” culture to one where we “start clean.”

—Qingsu Wu, principal program manager, IES

“As we get more mature and divisions build up their muscles, we’re actually getting to an ongoing state of optimization,” says Feng Liu, principal product manager with IES. “As we build up that culture and that practice, folks are becoming more aware and taking more ownership and accountability.”

Some shadow IT divisions are even going beyond FinOps recommendations. For example, our commercial sales and marketing organization uses shadow IT solutions so extensively and is so keen to optimize their budget that they’ve automated the implementation of recommendations and created their own internal FinOps team.

“The whole vision of our shadow IT program is helping business teams to be self-accountable and sustainable,” says Qingsu Wu, principal program manager for the Infrastructure and Engineering Services (IES) team. “With these solutions and practices in place, we’ve moved on from a ‘get clean’ and ‘stay clean’ culture to one where we ‘start clean.’”

It’s all part of building a more effective culture and practice to do more with less.

Key Takeaways

  • Understand your inventory. Spend time linking your organizational hierarchy to your Azure resources.
  • Get to a confident view of your estate and your data. It’s crucial.
  • Don’t be overly prescriptive. Be open to how you’re going to approach the situation.
  • Build sustainability into your efforts by getting non-engineering teams more comfortable with regular engineering practices and learning from each other.
  • Don’t overlook small wins. When they scale out across an entire organization, they can produce substantial savings.

Related links

The post Doing more with less: Optimizing shadow IT through Microsoft Azure best practices appeared first on Inside Track Blog.

]]>
11267
Implementing Microsoft Azure cost optimization internally at Microsoft http://approjects.co.za/?big=insidetrack/blog/implementing-microsoft-azure-cost-optimization-internally-at-microsoft/ Tue, 07 Jun 2022 17:35:40 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=9389 We periodically update our stories, but we can’t verify that they represent the full picture of our current situation at Microsoft. We leave them on the site so you can see what our thinking and experience was at the time. Our Microsoft Digital team is aggressively pursuing Microsoft Azure cost optimization as part of our […]

The post Implementing Microsoft Azure cost optimization internally at Microsoft appeared first on Inside Track Blog.

]]>
Microsoft Digital technical storiesWe periodically update our stories, but we can’t verify that they represent the full picture of our current situation at Microsoft. We leave them on the site so you can see what our thinking and experience was at the time.

Our Microsoft Digital team is aggressively pursuing Microsoft Azure cost optimization as part of our continuing effort to improve the efficiency and effectiveness of our enterprise Azure environment here at Microsoft and for our customers.

Adopting data-driven cost-optimization techniques, investing in central governance, and driving modernization efforts throughout our Microsoft Azure environment, makes it so our environment—one of the largest enterprise environments hosted in Azure—is a cost efficient blueprint that all customers can look to for lessons on how to lower their Azure costs.

We began our digital transformation journey in 2014 with the bold decision to migrate our on-premises infrastructure to Microsoft Azure so we could capture the benefits of a cloud-based platform—agility, elasticity, and scalability. Since then, our teams have progressively migrated and transformed our IT footprint to the largest cloud-based infrastructure in the world—we host more than 95 percent of our IT resources in Microsoft Azure.

The Microsoft Azure platform has expanded over the years with the addition of hundreds of services, dozens of regions, and innumerable improvements and new features. In tandem, we’ve increased our investment in Azure as our core destination for business solutions at Microsoft. As our Azure footprint has grown, so has the environment’s complexity, requiring us to optimize and control our Azure expenditures.

Optimizing Microsoft Azure cost internally at Microsoft

Our Microsoft Azure footprint follows the resource usage of a typical large-scale enterprise. In the past few years, our cost-optimization efforts have been more targeted as we attempted to minimize the rising total cost of ownership in Azure due to several factors, including increased migrations from on-premises and business growth. This focus on optimization instigated an investment in tools and data insights for cost optimization in Azure.

The built-in tools and data that Microsoft Azure provides form the core of our cost-optimization toolset. We derive all our cost-optimization tools and insights from data in Microsoft Azure Advisor, Microsoft Azure Cost Management and Billing, and Microsoft Azure Monitor. We’ve also implemented design optimizations based on modern Azure resource offerings. We extract recommendations from Azure Advisor across the different Azure service categories and push those recommendations into our IT service management system, where the services’ owners can track and manage the implementation of recommendations for their services.

Understanding holistic optimization

As the first and largest adopter of Microsoft Azure, we’ve developed best practices for engineering and maintenance in Azure that support not only cost optimization but also a comprehensive approach to capturing the benefits of cloud computing in Azure. We developed and refined the Microsoft Well-Architected Framework as a set of guiding tenets for Azure workload modernization and a standard for modern engineering in Azure. Cost optimization is one of five components in the Well-Architected Framework that work together to support an efficient and effective Azure footprint. The other pillars include reliability, security, operational excellence, and performance efficiency. Cost optimization in Azure isn’t only about reducing spending. In Azure’s pay-for-what-you-use model, using only the resources we need when we need them, in the most efficient way possible, is the critical first step toward optimization.

Optimization through modernization

Reducing our dependency on legacy application architecture and technology was an important part of our first efforts in cost optimization. We migrated many of our workloads from on-premises to Microsoft Azure by using a lift-and-shift method: imaging servers or virtual machines exactly as they existed in the datacenter and migrating those images into virtual machines hosted in Azure. Moving forward, we’ve focused on transitioning those infrastructure as a service (IaaS) based workloads to platform as service (PaaS) components in Azure to modernize the infrastructure on which our solutions run.

Focus areas for optimization

We’ve maintained several focus areas for optimization. Ensuring the correct sizing for IaaS virtual machines was critical early in our Microsoft Azure adoption journey, when those machines accounted for a sizable portion of our Azure resources. We currently operate at a ratio of 80 percent PaaS to 20 percent IaaS, and to achieve this ratio we’ve migrated workloads from IaaS to PaaS wherever feasible. This means transitioning away from workloads hosted within virtual machines and moving toward more modular services such as Microsoft Azure App Service, Microsoft Azure Functions, Microsoft Azure Kubernetes Service, Microsoft Azure SQL, Microsoft Azure Cosmos database. PaaS services like these offer better native optimization capabilities in Microsoft Azure than virtual machines, such as automatic scaling and broader service integration. As the number of PaaS services has increased, automating scalability and elasticity across PaaS services has been a large part of our cost-optimization process. Data storage and distribution has been another primary focus area as we modify scaling, size, and data retention configuration for Microsoft Azure Storage, Azure SQL, Azure Cosmos DB, Microsoft Azure Data Lake, and other Azure storage-based services.

Implementing practical cost optimization

While Microsoft Azure Advisor provides most recommendations at the individual service level—Microsoft Azure Virtual Machines, for example—implementing these recommendations often takes place at the application or solution level. Application owners implement, manage, and monitor recommendations to ensure continued operation, account for dependencies, and keep the responsibility for business operations within the appropriate business group at Microsoft.

For example, we performed a lift-and-shift migration of our on-premises virtual lab services into Microsoft Azure. The resulting Azure environment used IaaS-based Azure virtual machines configured with nested virtualization. The initial scale was manageable using the nested virtualization model. However, the Azure-based solution was more convenient for hosting workloads than the on-premises solution, so adoption began to increase exponentially, which made management of the IaaS-based solution more difficult. To address these challenges, the engineering team responsible for the virtual lab environment re-architected the nested virtual machine design to incorporate a PaaS model using microservices and Azure-native capabilities. This design made the virtual lab environment more easily scalable, efficient, and resilient. The re-architecture addressed the functional challenges of the IaaS-based solution and reduced Azure costs for the virtual lab by more than 50 percent.

In another example, an application used Microsoft Azure Functions with the Premium App Service Plan tier to account for long-running functions that wouldn’t run properly without the extended execution time enabled by the Premium tier. The engineering team converted the logic in the Function Apps to use Durable Functions, an Azure Functions extension, and more efficient function-chaining patterns. This reduced execution time to less than 10 minutes, which allowed the team to switch the Function Apps to the Consumption tier, reducing cost by 82 percent.

Governance

To ensure effective identification and implementation of recommendations, governance in cost optimization is critical for our applications and the Microsoft Azure services that those applications use. Our governance model provides centralized control and coordination for all cost-optimization efforts. Our model consists of several important components, including:

  • Microsoft Azure Advisor recommendations and automation. Advisor cost management recommendations serve as the basis for our optimization efforts. We channel Advisor recommendations into our IT service management and Microsoft Azure DevOps environment to better track how we implement recommendations and ensure effective optimization.
  • Tailored cost insights. We’ve developed dashboards to identify the costliest applications and business groups and identify opportunities for optimization. The data that these dashboards provide help empower engineering leaders to observe and track important Azure cost components in their service hierarchy to ensure that optimization is effective.
  • Improved Microsoft Azure budget management. We perform our Azure budget planning by using a bottom-up approach that involves our finance and engineering teams. Open communication and transparency in planning are important, and we track forecasts for the year alongside actual spending to date to enable accurate adjustments to spending estimates and closely track our budget targets. Relevant and easily accessible spending data helps us identify trend-based anomalies to control unintentional spending that can happen when resources are scaled or allocated unnecessarily in complex environments.

Implementing a governance solution has enabled us to realize considerable savings by making a simple change to Microsoft Azure resources across our entire footprint. For example, we implemented a recommendation to convert Microsoft Azure SQL Database instances from the Standard database transaction unit (DTU) based tier to the General Purpose Serverless tier by using a simple Microsoft Azure Resource Manager template and the auto-pause capability. The configuration change reduced costs by 97 percent.

Benefits of Microsoft Azure

Ongoing optimization in Microsoft Azure has enabled us to capture the value of Azure to help increase revenue and grow our business. Our yearly budget for Azure has remained almost static since 2014, when we hosted most of our IT resources in on-premises datacenters. Over that period, Microsoft has grown by more than 20 percent,

Our recent optimization efforts have resulted in significantly reduced spending across numerous Microsoft Azure services. Examples, in addition to those already mentioned, include:

  • Right-sizing Microsoft Azure virtual machines. We generated more than 300 recommendations for VM size changes to increase cost efficiency. These recommendations included switching to burstable virtual machine sizes and accounted for a 15 percent cost savings.
  • Moving virtual machines to latest generation of virtual machine sizes. Moving from older D-series and E-series VM sizes to their current counterparts generated more almost 2,500 recommendations and a cost savings of approximately 30 percent.
  • Implementing Microsoft Azure Data Explorer recommendations. More than 200 recommendations were made for Microsoft Azure Data Explorer optimization, resulting in significant savings.
  • Incorporating Cosmos DB recommendations. More than 170 Cosmos DB recommendations reduced cost by 11 percent.
  • Implementing Microsoft Azure Data Lake recommendations. More than 30 Azure Data Lake recommendations combined to reduce costs by approximately 15 percent.

Key Takeaways

Cost optimization in Microsoft Azure can be a complicated process that requires significant effort from several parts of the enterprise. The following are some the most important lessons that we’ve taken from our cost-optimization journey:

Implement central governance with local accountability

We implemented a central audit of our Microsoft Azure cost-optimization efforts to help improve our Azure budget-management processes. This audit enabled us to identify gaps in our methods and make the necessary engineering changes to address those gaps. Our centralized governance model includes weekly and monthly leadership team reviews of our optimization efforts. These meetings allow us to align our efforts with business priorities and assess the impact across the organization. The service owner still owns and is accountable for their optimization effort.

Use a data-driven approach

Using optimization-relevant metrics and monitoring from Microsoft Azure Monitor is critical to fully understanding the necessity and impact of optimization across services and business groups. Accurate and current data is the basis for making timely optimization decisions that provide the largest cost savings possible and prevent unnecessary spending.

Be proactive

Real-time data and effective cost optimization enable proactive cost-management practices. Cost-management recommendations provide no financial benefit until they’re implemented. Getting from recommendation to implementation as quickly as possible while maintaining governance over the process is the key to maximizing cost-optimization benefits.

Adopt modern engineering practices

Cost optimization is one of the five components of the Microsoft Azure Well-Architected Framework, and each pillar functions best when supported by proper implementation of the other four. Adopting modern engineering practices that support reliability, security, operational excellence, and performance efficiency will help to enable better cost optimization in Microsoft Azure. This includes using modern virtual machine sizes where virtual machines are needed and architecting for Azure PaaS components such as Microsoft Azure Functions, Microsoft Azure SQL, and Microsoft Azure Kubernetes Service when virtual machines aren’t required. Staying aware of new Azure services and changes to existing functionality will also help you recognize cost-optimization opportunities as soon as possible.

Looking forward to more optimization

As we continue our journey, we’re focusing on refining our efforts and identifying new opportunities for further cost optimization in Microsoft Azure. The continued modernization of our applications and solutions is central to reducing cost across our Azure footprint. We’re working toward ensuring that we’re using the optimal Azure services for our solutions and building automated scalability into every element of our Azure environment. Using serverless and containerized workloads is an ongoing effort as we reduce our investment in the IaaS components that currently support some of our legacy technologies.

We’re also improving our methods for decentralizing optimization recommendations to enable our engineers and application owners to make the best choices for their environments while still adhering to central governance and standards. This includes automating the detection of anomalous behavior in Microsoft Azure billing by using service-wide telemetry and logging, data-driven alerts, root-cause identification, and prescriptive guidance for optimization.

Microsoft Azure optimization is a continuous cycle. As we further refine our optimization efforts, we learn from what we’ve done in the past to improve what we’ll do in the future. Our footprint will continue to grow in the years ahead, and our cost-optimization efforts will expand accordingly to ensure that our business is capturing every benefit that the Azure platform provides.

Related links

We'd like to hear from you!

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Implementing Microsoft Azure cost optimization internally at Microsoft appeared first on Inside Track Blog.

]]>
9389