Network Security Archives - Inside Track Blog http://approjects.co.za/?big=insidetrack/blog/tag/network-security/ How Microsoft does IT Thu, 26 Mar 2026 16:08:23 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 137088546 Microsoft CISO advice: Read our four tips for securing your network http://approjects.co.za/?big=insidetrack/blog/microsoft-ciso-advice-read-our-four-tips-for-securing-your-network/ Thu, 19 Mar 2026 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=22779 Geoff Belknap, CVP and operating CISO for Core and Enterprise, shares four key practices your business can use to be prepared for managing network security incidents. Learn from our experience Network isolation (Secure Future Initiative) “Knowing where devices are, who owns them, and what they’re supposed to be doing is pretty important in the middle […]

The post Microsoft CISO advice: Read our four tips for securing your network appeared first on Inside Track Blog.

]]>
Geoff Belknap, CVP and operating CISO for Core and Enterprise, shares four key practices your business can use to be prepared for managing network security incidents.

“Knowing where devices are, who owns them, and what they’re supposed to be doing is pretty important in the middle of an incident,” Belknap says.

Watch this video to see Geoff Belknap discuss how we’re securing our network at Microsoft. (For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=nWPaaTHGE-M.)

Key takeaways

Here are best practices you can use to secure your network:

  • Build a complete inventory. Keep track of what your network devices are, who owns them, and what they do.
  • Capture robust telemetry. Make sure your operational teams have the tools they need to see and analyze access and authentication logs.
  • Use dynamic access control. Manage who can send packets on the corporate network by applying policies.
  • Deprecate old network assets. Cyberattackers know to look for older, unpatched network devices. You can reduce the attack surface by replacing older devices.

The post Microsoft CISO advice: Read our four tips for securing your network appeared first on Inside Track Blog.

]]>
22779
Microsoft CISO advice: Explore our four tips for securing your customer support ecosystem http://approjects.co.za/?big=insidetrack/blog/microsoft-ciso-advice-explore-our-four-tips-for-securing-your-customer-support-ecosystem/ Thu, 12 Mar 2026 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=22635 Microsoft business operations teams know all too well that cyberattackers seek to exploit customer support pathways. Tools that can unlock customer accounts or aid in troubleshooting issues in complex environments are a rich target. “The path attackers really like to use is to compromise support tooling and laterally move to your core tooling,” says Raji […]

The post Microsoft CISO advice: Explore our four tips for securing your customer support ecosystem appeared first on Inside Track Blog.

]]>
Microsoft business operations teams know all too well that cyberattackers seek to exploit customer support pathways. Tools that can unlock customer accounts or aid in troubleshooting issues in complex environments are a rich target.

“The path attackers really like to use is to compromise support tooling and laterally move to your core tooling,” says Raji Dani, Deputy Chief Information Security Officer (CISO) for Microsoft business operations.

Dani and her team focus on understanding and mitigating the risks within customer support operations. In this video, she shares principles and practices for every business that relies on online tools in their customer support ecosystem.

Watch this video to see Raji Dani discuss four customer support ecosystem security principles. (For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=rJ87jjz3vvo .)

Key takeaways

Here are best practices you can apply to your customer support ecosystem:

  • Create dedicated and isolated support identities. Use standardized support identities with phish-resistant multifactor authentication based in a separate identity ecosystem.
  • Implement least privilege and enforce device protection. Only grant the access needed for a given task and nothing more.
  • Ensure tooling does not have high privilege access to customer data. Architect secure tools and manage service-to-service trust and high privileged access.
  • Implement strong telemetry. Anomalous patterns in logs and telemetry data are often the first clue a cyberattack is underway.

The post Microsoft CISO advice: Explore our four tips for securing your customer support ecosystem appeared first on Inside Track Blog.

]]>
22635
Hardening our digital defenses with Microsoft Baseline Security Mode http://approjects.co.za/?big=insidetrack/blog/hardening-our-digital-defenses-with-microsoft-baseline-security-mode/ Tue, 18 Nov 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20811 Security isn’t just a feature—it’s a foundation. As threats grow more varied, widespread, and sophisticated, enterprises need to rethink how they protect their environments. That’s why we, in Microsoft Digital, the company’s IT organization, took a necessary step forward and deployed Microsoft Baseline Security Mode internally across the company. Engage with our experts! Customers or […]

The post Hardening our digital defenses with Microsoft Baseline Security Mode appeared first on Inside Track Blog.

]]>
Security isn’t just a feature—it’s a foundation.

As threats grow more varied, widespread, and sophisticated, enterprises need to rethink how they protect their environments. That’s why we, in Microsoft Digital, the company’s IT organization, took a necessary step forward and deployed Microsoft Baseline Security Mode internally across the company.

Baseline Security Mode is a new approach to endpoint protection that enforces secure-by-default configurations across our enterprise. And it’s not just about locking things down—it’s about doing so in a way that’s scalable, manageable, and respectful of user experience.

This is a story for every organization trying to balance usability with security. Baseline Security Mode is designed to help IT teams enforce protections without breaking productivity. It’s a shift toward proactive defense with standardized secure settings.

Understanding the need for Microsoft Baseline Security Mode

Security must evolve with the environment.

At Microsoft Digital, we’ve built a strong foundation of endpoint protection over the years. But as our ecosystem expanded—more devices, more workloads, more diverse user needs—we saw an opportunity to take our security posture to the next level.

Our existing configurations were effective, but they reflected the natural complexity of a large enterprise. Different teams had different requirements. Some relied on legacy technologies that had served them well. Others needed flexibility to support specialized workflows. Over time, this led to variation in how security policies were applied.

We wanted to unify that approach.

Baseline Security Mode emerged as a way to streamline and strengthen our defenses. It was about building on what worked. We started by identifying areas where legacy protocols and configurations could be modernized. That included technologies like ActiveX controls and older authentication flows, which we carefully evaluated and phased out where appropriate.

We also improved how we gather and use telemetry. Initially, we had limited visibility into how certain features were used. That made it harder to predict the impact of changes. So, we ran pilots, collected feedback, and refined our approach. Baseline Security Mode was a game changer here, providing built-in reports that gave us the visibility we needed to observe the impact of applying settings in our environment. For example, when we reviewed blocking legacy file formats, we discovered that some workflows depended on them. We responded quickly, offering alternatives and guiding users through the transition.

Ease of use was a priority.

We built intuitive controls into the Microsoft 365 admin center, allowing IT admins to manage policies with just a few clicks. No more manual scripts. No more guesswork. We also introduced exception handling to support specialized needs, ensuring that security didn’t come at the cost of productivity.

We worked closely with internal stakeholders, including compliance teams and work councils, to validate every step and build trust. We made sure the experience was smooth, the tools were reliable, and the changes were clearly communicated.

This wasn’t just a technical upgrade—it was a cultural shift.

Baseline Security Mode gave us a way to unify our security posture while honoring the diversity of our environment. It’s a smarter, more scalable way to protect our endpoints, and it reflects everything we’ve learned from years of experience.

Putting consistent security configuration into practice

Baseline Security Mode establishes a new standard, enabling organizations to be secure by default.

It is the result of a collaborative effort of multiple product teams at Microsoft, building on their security and incident-handling expertise.  It’s designed to simplify and strengthen endpoint protection across Windows and Microsoft 365. The feature lives in the Microsoft 365 admin center, where IT admins can enforce modern security policies with just a few clicks.

“When we blocked certain file formats, users were confused by the error messages and thought they were blocked from saving the file. So, we ran pilots, gathered feedback, and helped the product team build an improved error experience to save blocked formats to safe, newer formats.”

Harshitha Digumarthi, senior product manager, Microsoft Digital

The product teams delivered 20 features across five workloads: Office, OneDrive and SharePoint, Teams, Substrate, and Identity. Each one targets a specific risk—blocking legacy authentication, disabling insecure protocols, restricting ActiveX, and more.

When we deployed Baseline Security Mode as Customer Zero at Microsoft Digital, our job was to validate these features and controls in real-world enterprise conditions.

We pushed for exception handling.

Some users still relied on legacy formats or protocols. Certain teams, for example, needed access to older Office features. So, we worked with the product team to ensure exceptions could be built into the UI.

That flexibility was key. We knew from experience that without it, customers might hesitate to adopt the feature.

“When we blocked certain file formats, users were confused by the error messages and thought they were blocked from saving the file,” says Harshitha Digumarthi, a senior product manager at Microsoft Digital. “So, we ran pilots, gathered feedback, and helped the product team build an improved error experience to save blocked formats to safe, newer formats.”

We also pushed for better telemetry.

A photo of Gonis.

“When we heard about Baseline Security Mode, it was still in ideation. There were no tools in the Microsoft 365 admin center yet. We had to figure out how to enable this internally while the product team built the capabilities in parallel.”

Markus Gonis, senior service engineer, Microsoft Digital

At first, we had only a few days of data. That wasn’t enough to understand how features were used or what impact they would have. So we worked with the product team to expand telemetry, improve error reporting, and reduce false positives, including identifying bugs that skewed metrics and made troubleshooting harder.

We ran the deployment through our Tenant Trust Program and work council reviews to ensure global compliance. That gave us—and our customers—confidence.

Baseline Security Mode isn’t just a feature. It’s a shift in how we think about security, and we’re proud to have helped shape it.

Deploying Baseline Security Mode at Microsoft Digital

Rolling out Baseline Security Mode wasn’t just a technical exercise—it was a cross-team effort that demanded precision, patience, and partnership.

Microsoft Digital took the lead on deployment. We acted as Customer Zero, testing every feature in real-world conditions before it reached customers. That meant working closely with the product team to validate functionality, identify bugs, and shape the user experience.

“When we heard about Baseline Security Mode, it was still in ideation,” Gonis says. “There were no tools in the Microsoft 365 admin center yet. We had to figure out how to enable this internally while the product team built the capabilities in parallel.”

Telemetry was limited. We had only 30 days of data to work with. That made it hard to predict how changes would affect users, so we ran pilots with internal user acceptance testing cohorts and we deployed in phases.

Philpott appears in a photo.

“It was a great Customer Zero experience. Our security teams stood to benefit from Baseline Security Mode features, and we helped the product team find bugs and the issues that just hadn’t come up in early testing or at a large scale. It was a win-win situation”

John Philpott, principal product manager at Microsoft Digital

For some legacy protocols, usage was low. In these cases, the features being deployed made removing these protocols seamless. Where usage was higher or unclear, a more detailed approach was required.

First, a few thousand users. Then 50,000. Then 100,000. Eventually, the entire Microsoft tenant. We paused between each wave to monitor help desk tickets, gather feedback, and confirm that our mitigation strategies were working.

Communication was critical.

We ran targeted campaigns, sent individual emails, and published technical reports explaining what was changing, why it mattered, and how users could adapt. We even used Viva Engage to notify users directly. It was important to explain to users why longstanding functionalities were being removed. We had to explain what we were doing and how to mitigate any impact.

We did a lot of work with the product team to ensure the user experience and the IT pro experience both exceeded expectations.

“It was a great Customer Zero experience,” says John Philpott, principal product manager within Microsoft Digital. “Our security teams stood to benefit from Baseline Security Mode features, and we helped the product team find bugs and the issues that just hadn’t come up in early testing or at a large scale. It was a win-win situation.”

We flagged inconsistencies in policy syntax, pushed for better error handling, and worked with the product team to align deployment tools across workloads.

But we didn’t stop at deployment. We tracked progress, validated telemetry, and signed off on each feature before it moved into broader rollout. We even helped pave the way for the next iterations, identifying features that needed more design work or deeper telemetry before they could be deployed.

This was a true partnership. The product team built the features. We tested them, validated them, and helped make them better.

Baseline Security Mode is now live across Microsoft. And it’s ready for the world.

Capturing real benefits

Baseline Security Mode is more than a set of policies—it’s a platform for proactive defense.

The product team built it to reduce legacy risks and enforce modern security standards across Microsoft 365 workloads. Microsoft Digital validated it in production, surfacing bugs, shaping telemetry, and confirming that the features worked as intended.

We tested 22 features across Office, OneDrive & SharePoint, Substrate, Identity, and Teams. Each one targeted a specific vulnerability—like blocking ActiveX controls, disabling Exchange Web Services, or enforcing phishing-resistant authentication for admins.

We flagged critical ActiveX dependencies in third-party apps —something the product group hadn’t found—which enabled them to initiate removal. That kind of early detection helped fix issues before the features reached customers.

We found regressions in PowerShell and legacy authentication flows. The OneDrive and SharePoint team caught a high-impact bug and worked with the product team to resolve it.

That validation mattered.

We also helped shape the admin experience.

Exception handling was built into the UI. Admins could create security groups, assign users, and manage exclusions directly in the Microsoft 365 admin center.

“There’s no need to handle everything manually,” Philpott says. “Simply click here and then here to disable. It’s a much simpler process.”

Extending benefits to Microsoft customers

Baseline Security Mode is ready for enterprise.

We’ve tested it. We’ve hardened it. And we’ve made it easier to adopt.

Microsoft Digital’s deployment journey helped shape the product into something customers can trust. We didn’t just validate features—we made sure they worked in real-world environments, across diverse teams, and under the pressure of scale.

 The product team designed the features to be enterprise-ready. We ran them through our Tenant Trust Program and work council reviews to ensure compliance across global regions. That gave us confidence—and gave customers confidence too.

The benefits are clear. We’ve reduced our attack surface. We’ve improved compliance. We’ve made it easier for IT teams to enforce security without disrupting workflows. And we’ve laid the groundwork for secure-by-default computing across Microsoft.

 Customers can do the same.

Start small. Run pilots. Monitor impact. Use the tools in the Microsoft 365 admin center to deploy policies, manage exceptions, and guide users through the change. And don’t be afraid to ask for help—our journey has shown that collaboration between deployment teams and product teams makes all the difference.

Baseline Security Mode is ready, and we’re ready to help others adopt it.

Looking ahead

The first wave of Baseline Security Mode—BSM 2025—delivered 22 features across five major workloads. Microsoft Digital helped validate and deploy those features across the enterprise. And the next wave of features is already in motion.

And it’s bigger, with 46 features, more than double what we had in the first round. The product team is expanding coverage to include deeper protocol restrictions, broader app controls, and more granular authentication policies.

We’re also preparing for broader industry adoption.  

Governments, regulators, and enterprise customers are asking for secure-by-default configurations. Baseline Security Mode is our answer. And the next version will make it even easier to adopt.

We’ll continue to lead as Customer Zero. We’ll test new features, validate insights surfaced by telemetry, and share feedback with the product team. We’ll run pilots, monitor impact, and guide users through the change. And we’ll keep pushing for simplicity, scalability, and trust.

Because security isn’t a one-time project— It’s a mindset, and it’s Microsoft’s highest priority.

Key takeaways

Ready to adopt Baseline Security Mode? Here’s some actions we recommend based on our deployment experience:

  • Start with a pilot: Test Baseline Security Mode with a small group of users to identify legacy dependencies and gather feedback before scaling.
  • Use the Microsoft 365 admin center for deployment: Apply policies and manage exceptions directly through the UI—no scripting required.
  • Identify and plan for exceptions early: Work with business units to understand where legacy formats or protocols are still needed and create security groups for exclusions.
  • Communicate proactively with users: Launch campaigns to explain upcoming changes, their impact, and how users can adapt.
  • Validate telemetry and error reporting: Ensure your environment captures enough data to monitor the impact of new policies and troubleshoot effectively.
  • Engage your compliance and governance stakeholders: Review new policies with internal governance teams to ensure alignment with organizational and regional standards.
  • Treat security as an ongoing journey: Continue to monitor, iterate, and evolve your security posture as new threats and features emerge.

The post Hardening our digital defenses with Microsoft Baseline Security Mode appeared first on Inside Track Blog.

]]>
20811
Powering agentic AI adoption at Microsoft: Our ‘Customer Zero’ story http://approjects.co.za/?big=insidetrack/blog/powering-agentic-ai-adoption-at-microsoft-our-customer-zero-story/ Thu, 13 Nov 2025 18:45:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20862 At Microsoft, we are enabling our employees, teams, and organizations to build AI agents to help them complete important tasks—from individual employees in the personal productivity tenant all the way to enterprise-wide agents that are available to everyone. Engage with our experts! Customers or Microsoft account team representatives from Fortune 500 companies are welcome to […]

The post Powering agentic AI adoption at Microsoft: Our ‘Customer Zero’ story appeared first on Inside Track Blog.

]]>
At Microsoft, we are enabling our employees, teams, and organizations to build AI agents to help them complete important tasks—from individual employees in the personal productivity tenant all the way to enterprise-wide agents that are available to everyone.

In short, we’re all-in on agentic AI, and we want to help you get there, too.

“We’ve made a lot of progress deploying and driving adoption of Microsoft 365 Copilot since it was released, and we’re now doing the same when it comes to enabling our employees and our teams to build agents that make us more productive,” says Brian Fielder, vice president of Microsoft Digital, the company’s IT organization. “We’re Customer Zero at Microsoft, which means we’re the first to deploy and use the technology and services that we sell to our customers. Those learnings give us a unique perspective and story to share with you about the journey we’ve been on with AI and agents.”

We have two collections of agentic AI content that we think will be useful to you.

A photo of Fielder

“When it comes to agents, we’re still at the start. We expect to learn much more as we continue, lessons we’ll share here—stay connected and we’ll continue to share our story with you.” 

Brian Fielder, vice president, Microsoft Digital

The first set of stories documents our vision and strategy for agents. They walk you through our experience deploying agentic AI, our work to create tools that enable our employees to dive in, and, through smart governance, empower everyone at Microsoft to be confident and creative with how they use agents while keeping the company safe and secure.

Our second set of stories highlights some of the most interesting and effective agents that our employees, teams, and organizations have built. These stories will not only give you examples of agents that we’ve built, they show how you can go about building  similar agents for your organization based on the collective experience of our employees and teams at Microsoft.

“We hope you find reviewing the journey we’ve been on practical and useful,” Fielder says. “When it comes to agents, we’re still at the start. We expect to learn much more as we continue, lessons we’ll share here—stay connected and we’ll continue to share our story with you.”  


Deploying agentic AI at Microsoft


Agents we’ve deployed internally at Microsoft


Key takeaways

We hope that you find our agentic AI stories useful. We wanted to share a mixture of our strategy and vision around enabling our employees to deploy agents, and to share stories that feature some of the most promising agents that our employees and teams have built and deployed.

We also understand that it can feel challenging to know where to start—it was for us. Here are some things we learned along the way that should help you:

  • Governing agents is complex, and dependent on the overall AI maturity of your organization. Start slowly to build that maturity before unleashing too many new agents in your environment.
  • A strong policy framework is the foundation. Lean on existing app governance policies, then layer agent-specific structures on top.
  • Invest in data infrastructure and AI platforms. Building robust data infrastructure ensures your organization is prepared to leverage AI, and supports scalable, innovative, and secure AI-driven solutions.
  • Develop a building environment strategy. Decide what scenarios match up with specific environments and make the right environments available to the relevant employees.
  • Global regulations around categories like privacy, security, and responsibility provide a good baseline for establishing governance policies. Set relevant teams to work thinking through these regulations and incorporate their insights into your agent governance.
  • Foster a culture of creativity and teamwork. Champion an AI-forward culture where innovation and collaboration drive the adoption of agentic AI.
  • Develop AI expertise through training and development. As agentic AI transforms workflows and business outcomes across every industry, upskilling will empower your teams to navigate the rapid advances of AI, drive innovation, and ensure your organization stays competitive.
  • Align AI initiatives with strategy. Ensuring AI initiatives align with business goals maximizes their impact and positions your organization to succeed in the rapidly evolving world of agentic AI.
  • Implement ethical AI practices. You can use Microsoft’s Responsible AI principles as a guide. Adopting ethical AI practices builds trust, ensures responsible innovation, and prepares your organization to navigate the evolving landscape as AI becomes central to business operations and decision-making.

The post Powering agentic AI adoption at Microsoft: Our ‘Customer Zero’ story appeared first on Inside Track Blog.

]]>
20862
Vuln.AI: Our AI-powered leap into vulnerability management at Microsoft http://approjects.co.za/?big=insidetrack/blog/vuln-ai-our-ai-powered-leap-into-vulnerability-management-at-microsoft/ Thu, 16 Oct 2025 16:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20623 In today’s hyperconnected enterprise landscape, vulnerability management is no longer a back-office function—it’s a frontline defense. With thousands of devices from a multitude of vendors, and a relentless stream of Common Vulnerabilities and Exposures (CVEs), here at Microsoft we faced a challenge familiar to every IT decision maker: how to scale vulnerability response without scaling […]

The post Vuln.AI: Our AI-powered leap into vulnerability management at Microsoft appeared first on Inside Track Blog.

]]>
In today’s hyperconnected enterprise landscape, vulnerability management is no longer a back-office function—it’s a frontline defense. With thousands of devices from a multitude of vendors, and a relentless stream of Common Vulnerabilities and Exposures (CVEs), here at Microsoft we faced a challenge familiar to every IT decision maker: how to scale vulnerability response without scaling cost, complexity, or risk.

A photo of Fielder.

“While AI enables amazing capabilities for knowledge workers, it also increases the threat landscape, since bad actors using AI are constantly probing for vulnerabilities. Vuln.AI helps keep Microsoft safe by identifying and accelerating the mitigation of vulnerabilities in our environment.”

Brian Fielder, vice president, Microsoft Digital 

Enter Vuln.AI, an intelligent agentic system developed by our team in Microsoft Digital—the company’s IT organization—to transform how we identify, prioritize, and resolve vulnerabilities across our enterprise network.

Manual methods can’t keep up

As a company, we detect over 600 million cybersecurity threats every day, according to our latest Digital Defense Report. Some of those signals are bad actors probing our internal network and infrastructure looking for unpatched vulnerabilities. Our infrastructure supports over 300,000 employees and vendors, 25,000 network devices, and over 560 buildings across 102 countries. This scale means we face a constant stream of vulnerabilities—each requiring triage, impact analysis, and remediation.

“While AI enables amazing capabilities for knowledge workers, it also increases the threat landscape, since bad actors using AI are constantly probing for vulnerabilities. Vuln.AI helps keep Microsoft safe by identifying and accelerating the mitigation of vulnerabilities in our environment,” says Brian Fielder, a vice president within Microsoft Digital. 

Historically, our Infrastructure, Networking, and Tenant team here in Microsoft Digital relied on manual assessments to determine which network devices were impacted by new vulnerabilities. Traditional vulnerability scanning tools generate a lot of false positives and false negatives, and a significant amount of analysis still falls to security engineers, requiring manual validation before any vulnerability impact can be communicated to device owners. These manual methods were time-consuming, error-prone, and reactive—our security engineers were spending hours on each vulnerability, at times missing critical threats or sinking too much time into false alarms.

A photo of Bansal.

“AI’s true power lies in the problem it’s applied to. Start by identifying the most time-consuming or painful task in your organization-then explore how AI can augment or improve it. Begin with a small, targeted enhancement and iterate continuously.”

Ankit Bansal, senior product manager, Microsoft Digital

With the vast number of vulnerabilities coming in every day, security engineers needed a scalable way to quickly analyze, prioritize, and respond.

The solution: Vuln.AI

We already achieved dramatic impact with our AI Ops and Network Infrastructure Copilot, which is on track to save us over 11,000 hours of network service management time per year. We built Vuln.AI on top of that investment:

  1. The Research Agent analyzes vulnerability feeds and network metadata from our Infrastructure Data Lakehouse (IDL) built on top of Azure Data Explorer, which regularly ingests data from our device vendors and other sources. Once new vulnerabilities are detected, it automates the identification of impacted devices and integrates with other internal tooling for validation and reporting.
  2. The Interactive Agent acts as a gateway for engineers and device owners to ask follow-up questions and initiate remediation. Through agent-to-agent interaction, it leverages our Network Infrastructure Copilot to query the research agent’s findings. This agentic interface enables real-time decision-making and contextual insights.

Together, these agents are significantly improving our network security operations. The results we’re seeing so far are compelling:

  • A 70% reduction in time to vulnerability insights, enabling faster prioritization and mitigation, minimizing exposure windows.
  • Lower risk of compromise through increased accuracy, quicker detection, and containment of threats.
  • A stronger compliance posture that supports adherence to financial, legal, and regulatory requirements.
  • Higher accuracy in identifying vulnerable devices, reducing false positives and missed threats
  • Engineering hours saved and reduced fatigue, significantly improving productivity.

Our gains translate to lower operational risk, faster response times, and more resilient infrastructure—critical outcomes for any enterprise navigating today’s threat landscape.

“AI’s true power lies in the problem it’s applied to,” says Ankit Bansal, a senior product manager within Microsoft Digital. “Start by identifying the most time-consuming or painful task in your organization-then explore how AI can augment or improve it. Begin with a small, targeted enhancement and iterate continuously.”

How Vuln.AI works

The system continuously ingests our CVE data from our device suppliers’ API feeds and a publicly available database of known cybersecurity vulnerabilities.  It correlates that data with device attributes such as its hardware model and OS to identify the potential impact on the network and surface actionable insights.

Engineers interact with the system via Copilot, Teams, or custom tooling, which allows seamless integration with our network security teams’ daily workflows.

“We built a hybrid approach in Vuln.AI to guide LLMs through complex security advisories,” says Blaze Kotsenburg, a software engineer in Microsoft Digital. “By combining structured function calls, templated prompts, and data validation, we keep the model focused on producing reliable, actionable insights for vulnerability mitigation.”

A photo of Lollis.

“We chose Durable Functions for Vuln.AI because it allowed us to confidently orchestrate complex, stateful research. The reliability and simplicity of the framework meant we could shift our focus to engineering the intelligence behind the agent, especially the prompting strategies used in Vuln.AI’s backend processing.”

Mike Lollis, a senior software engineer in Microsoft Digital.

When it came to building Vuln.AI, we relied heavily on our own technology platforms, including: 

  • Azure AI Foundry for model development and deployment
  • Azure Data Explorer to store device metadata and CVEs
  • Agent to agent interaction with Network Copilotto query our database for device and inventory knowledge
  • Azure OpenAI models for natural language processing and classification
  • Azure Durable Functions for fine-grained orchestration and custom LLM workflows

“We chose Durable Functions for Vuln.AI because it allowed us to confidently orchestrate complex, stateful research,” says Mike Lollis, a senior software engineer in Microsoft Digital.  “The reliability and simplicity of the framework meant we could shift our focus to engineering the intelligence behind the agent, especially the prompting strategies used in Vuln.AI’s backend processing.”

Vuln.AI in action

Consider a common scenario: a new CVE that affects a network switch has just been published. Vuln.AI’s research agent immediately flags the vulnerability, maps it to potentially affected devices in our network inventory, and pushes the findings to an internal database.

A photo of Lee.

“AI is only as good as the data you provide. Much of the success with Vuln.AI came from our dedicated efforts to source comprehensive vulnerability data and device attributes. For effective AI-powered solutions, you really need to invest in a strong data foundation and a strategy for how to integrate into the rest of your infrastructure.”

Linda Lee, product manager II, Microsoft Digital

This data then becomes immediately accessible in our internal tools, where it is validated and approved by security engineers. Following this, network engineers are provided with precise information about their vulnerable devices.

Engineers can prompt Vuln.AI’s interactive agent to instantly retrieve the following information:

“12 devices impacted by CVE-2025-XXXX. Would you like me to suggest some next steps for mitigation or remediation?”

With Vuln.AI, network engineers can now begin vulnerability response operations much more quickly—no spreadsheet wrangling and no delays.

“AI is only as good as the data you provide,” says Linda Lee, a product manager II within Microsoft Digital. “Much of the success with Vuln.AI came from our dedicated efforts to source comprehensive vulnerability data and device attributes. For effective AI-powered solutions, you really need to invest in a strong data foundation and a strategy for how to integrate into the rest of your infrastructure.”

It’s about automating manual workflows and research.

“Vuln.AI has reduced our triage time by over 50%,” says Vincent Bersagol, a principal security engineer in Microsoft Digital.

This is allowing our engineers to focus on deeper analysis.

“The synergy between security and AI engineering has unlocked a new level of precision in vulnerability insights,” Bersagol says. “This is just the beginning.”

The journey ahead

Our journey with AI-powered vulnerability management has only just begun. Looking ahead, our roadmap for Vuln.AI includes:

  • Extending data coverage to include more hardware suppliers
  • Integrating more detailed device profiles for more targeted vulnerability response
  • Supporting autonomous workflows to streamline network engineers’ remediation efforts
  • Incorporating other AI agents to support more security use cases

These enhancements will further reduce risk, accelerate response times, and empower engineers to focus on more strategic initiatives.

“Trust is the foundation of everything we do in Microsoft Digital,” Bansal says. “Securing our network is essential to upholding that trust. Intelligent solutions like Vuln.AI not only help us stay ahead of emerging threats—they also establish the blueprint for integrating AI more deeply into our security operations.”

For IT leaders, Vuln.AI offers a blueprint for modern vulnerability management:

  • Scalable: Handles thousands of devices and vulnerabilities with ease
  • Accurate: Reduces false positives and missed threats
  • Efficient: Saves time, money, and resources
  • Secure: Built on Microsoft’s trusted AI and security frameworks

In a world where every second counts and any threat can be costly, Vuln.AI transforms vulnerability management from a bottleneck into a competitive advantage for Microsoft.

Key takeaways

As your organization looks for ways to improve security and threat response in a fast-changing landscape, consider the following insights on how AI is reshaping vulnerability management at Microsoft:

  • Fight fire with fire: The threat landscape has broadened dramatically due to bad actors using AI. Supplementing your own efforts with AI can help you manage your risk more effectively than traditional vulnerability management.
  • Agility is key: Effective vulnerability response hinges on acting fast. An AI-powered solution like Vuln.AI can cut the time needed to analyze and mitigate vulnerabilities by over 50%, enabling organizations to enhance security operations at scale.
  • The future is now: Looking ahead, Microsoft Digital will integrate agentic workflows into more security operations, boosting efficiency in risk prevention, threat detection and response, thereby enabling security practitioners and developers to focus on more strategic projects.

The post Vuln.AI: Our AI-powered leap into vulnerability management at Microsoft appeared first on Inside Track Blog.

]]>
20623
Keeping our in-house optical network safe with a Zero Trust mentality http://approjects.co.za/?big=insidetrack/blog/keeping-our-in-house-optical-network-safe-with-a-zero-trust-mentality/ Thu, 16 Oct 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20611 When it comes to corporate connectivity at Microsoft, a minute of lost connection can lead to catastrophic disruptions for our product teams, sleepless nights for our network engineers, and millions of dollars of lost value for the company. That’s why we built our own optical network at our headquarters in Washington state, and that’s why […]

The post Keeping our in-house optical network safe with a Zero Trust mentality appeared first on Inside Track Blog.

]]>
When it comes to corporate connectivity at Microsoft, a minute of lost connection can lead to catastrophic disruptions for our product teams, sleepless nights for our network engineers, and millions of dollars of lost value for the company.

That’s why we built our own optical network at our headquarters in Washington state, and that’s why we’re building similar networks at other regional campuses around the United States and the rest of the world.

With so much on the line, we need to make sure these in-house networks never go down.

But how are we doing that?

We’re applying the same robust Zero Trust approach we take to security and identity. While our optical networks are extremely reliable, any complex system can be knocked offline. In alignment with the Zero Trust mentality we have as a company, we trusted the integrity of what we’ve built, but we needed a resilient backup system that went beyond redundancy to provide true resilience.

Driven by this goal, we created a Zero Trust Optical Business Continuity Disaster Recovery (BCDR) network that combines two fully independent optical systems designed to sustain uninterrupted services, even during systemic failures. The result is more confidence for our employees and vendors, less pressure on our network engineers, and comprehensive network resilience that will protect us against a major outage.

The urgency of resilience

In 2021, our team in Microsoft Digital, the company’s IT organization, deployed our first next-generation optical network to serve the exclusive network needs of our Puget Sound metro campuses. It offers more bandwidth on less fiber for a lower operational cost than leasing from traditional carriers.

“Puget Sound is a highly concentrated developer network where we need to provide very high throughput,” says Patrick Alverio, principal group software engineering manager for Infrastructure and Engineering Services within Microsoft Digital. “Our optical system is the backbone of all that traffic.”

Our state-of-the-art optical network fulfills our need for fast and reliable connectivity at up to 400 Gbps between core sites, labs, data centers, and the internet edge. We built this network on the Reconfigurable Optical Add/Drop Multiplexer (ROADM) technology, delivering dynamic reconfiguration, colorless, directionless, contentionless (CDC) capabilities, flexible grid support, remote provisioning, and automation. It also features a full-mesh topology that provides a layer of redundancy.

But what if the entire ROADM-based system fails?

There are plenty of operational risks that can derail even the most robust network. Anything from misconfigured automation scripts to policy changes to misaligned software versioning to simple human error can cause outages.

A photo of Elangovan

“We don’t want even a second of downtime. We needed a life raft for when failures occur that could also function as a standby network for core site migrations or platform upgrades.”

Vinoth Elangovan, senior network engineer, Hybrid Core Network Services, Microsoft Digital

To some degree, those kinds of minor disruptions are inevitable. But catastrophic events like fiber cuts, failures in the ROADM operating system, or even natural disasters have the potential for even more wide-ranging disruption.

During a catastrophic outage, thousands of engineers, developers, researchers, and other technical employees who need access to crucial lab environments and data centers could lose connectivity. That can sabotage feature delivery, disrupt product patches, interrupt updates, and halt all kinds of core product functions.

On top of normal software development operations, new AI tools demand massive bandwidth and consistent uptime. Finally, our hybrid networks feature paths integrated with Microsoft Azure that consume on-premises resources, so they also stand to benefit from increased resilience.

A catastrophic network outage can cause incredible damage to all of these business functions. In fact, we experienced exactly that in 2022.

A fiber cut combined with a ROADM system hardware reboot caused a five-minute outage at our Puget Sound metro region. In this environment, every minute of lost connectivity can result in significant financial impact, making network resilience absolutely essential.

“We don’t want even a second of downtime,” says Vinoth Elangovan, senior network engineer, who designed and implemented the Zero Trust Optical BCDR network for Microsoft. “We needed a life raft for when failures occur that could also function as a standby network for core site migrations or platform upgrades.”

Delivering greater network resilience

To ensure we could deliver uninterrupted network connectivity even in the midst of a catastrophic outage, we needed to consider the technical demands of a truly resilient system. Five design pillars helped us assemble our architectural criteria:

  1. Independent optical systems: To provide true resilience, our primary and BCDR platforms needed to operate autonomously.
  2. Physically independent paths: Circuits should avoid shared conduits, fibers, and splices to operate completely independently.
  3. Separate control software: The primary and backup networks should operate through dedicated network management systems (NMSs), automation, and provisioning domains.
  4. Unified client interface: Both systems needed to terminate into the same interface to unify service for clients and applications.
  5. Survivability by design: We couldn’t assume that any system would be immune to failure. Instead, we built for the best possible outcomes.

The result was the Zero Trust Optical BCDR architecture, a layered approach to optical networking. It consists of our primary, ROADM-based transport layer and a secondary, MUX-based transport layer, both terminating into a single logical port channel.

“Our core responsibility is the employee experience, so our main design thrust was making sure service is seamless and uninterrupted—even during an outage.”

Vinoth Elangovan, senior network engineer, Hybrid Core Network Services, Microsoft Digital

Both systems are live and active, which means they deliver production services through their own independent fibers, power supplies, and software stacks. By layering fully independent optical domains and logically unifying them at the Ethernet edge, the network can sustain a complete failure of one system and maintain continuity.

That physical and operational independence is the difference between simple redundancy and robust resilience.

“Our core responsibility is the employee experience, so our main design thrust was making sure it’s seamless and uninterrupted—even during an outage,” Elangovan says.

Optical network backed by a BCDR network

A schematic of an optical network running between different nodes and backed up by a BCDR network.
The optical network in our Puget Sound region connects core sites to labs, datacenters, and the internet edge, while the BCDR network provides backup connections to deliver resilience in case of a catastrophic network failure.

A typical ROADM optical network connects campus and data center sites to the internet edge. Our design features three interconnected optical rings, with two internet edges as multi-directional nodes, while other sites operate as dual-degree nodes with bidirectional redundancy. Meanwhile, our campuses and datacenters are designated as critical sites and equipped with Optical BCDR links to ensure enhanced resiliency. In the event of a complete Optical ROADM line failure, these critical sites retain connectivity.

In the event of an outage on the primary network, the port channel handles forward continuity automatically, shifting WAN traffic between optical paths in real time.

The transition occurs seamlessly and transparently, with no noticeable impact to clients.

A photo of Martin

“Our initial goal was to provide high-throughput connectivity for major labs, with less than six minutes of downtime per year. That represents a service level of 99.999% network continuity, and we’re aiming for even better moving forward.”

Blaine Martin, principal engineering manager, Hybrid Core Network Services, Microsoft Digital

Coupling at the Ethernet layer provides clients and applications with one logical interface, automatic load balancing and traffic distribution, and seamless failover, regardless of which optical domain is providing service.

“Our initial goal was to provide high-throughput connectivity for major labs, with less than six minutes of downtime per year,” says Blaine Martin, principal engineering manager for Hybrid Core Network Services in Microsoft Digital. “That represents a service level of 99.999% network continuity, and we’re aiming for even better moving forward.”

A new era of confidence for network engineers

For the network engineers who keep Microsoft employees and resources connected, the Zero Trust Optical BCDR network relieves much of the pressure that comes from resolving outages.

“Before, we were dependent on a single system, even with redundancies, so the human experience was like firefighting. Now, if the primary optical network is having a problem, I don’t even see it.”

Kevin Bullard, principal cloud network engineering manager, Microsoft Digital

When a network goes down, engineers have an enormous set of responsibilities to manage: processing the incident report, assigning severity, performing checks, notifying internal teams, providing updates, and engaging with physical support teams—all with a profound urgency to restore productivity.

Dialing those pressures back has been a huge benefit.

“Before, we were dependent on a single system, even with redundancies, so the human experience was like firefighting,” says Kevin Bullard, Microsoft Digital principal cloud network engineering manager responsible for maintaining WAN interconnectivity between labs. “Now, if the primary optical network is having a problem, I don’t even see it.”

There will always be pressure on network engineers to restore connectivity during an outage, but they can breathe easier knowing it won’t cost the company millions of dollars as the time to resolve ticks away. And in non-emergency situations like core site migrations, the BCDR network provides a much easier way to shunt services while the main network is offline.

“Our internal users have become more confident that they can stay connected, no matter what,” says Chakri Thammineni, principal cloud network engineer for Infrastructure and Engineering Services in Microsoft Digital. “That gives the people responsible for maintaining our enterprise networks incredible peace of mind.”

Fortunately, there hasn’t been a substantial network outage in the Puget Sound metro area since 2022. But our network engineering teams know that if and when it happens, the BCDR network will be ready to maintain service continuity.

A photo of Alverio.

“We’re always looking ahead into industry trends to stay at the bleeding edge, whether that’s in the technology we provide for our customers or the networks we use to do our own work.”

Patrick Alverio, principal group software engineering manager, Infrastructure and Engineering Services, Microsoft Digital

With our Puget Sound network protected, we have plans in place to extend this model to other metro areas. Naturally, we have to balance population, criticality, and the knowledge that elevated reliability and availability come with a cost.

Our selection criteria for new BCDR networks have largely centered around two factors: expansions of AI-critical infrastructure and concentrations of secure access workspaces (SAWs) for technical employees. With these criteria in mind, we’re planning new BCDR networks first in the Bay Area and Dublin, then in Virginia, Atlanta, and London.

Zero Trust optical BCDR architecture represents a paradigm shift in enterprise network resilience, and we’re committed to expanding the model to benefit both conventional workloads and the expanding infrastructure demands of AI.

“We’re always looking ahead into industry trends to stay at the bleeding edge, whether that’s in the technology we provide for our customers or the networks we use to do our own work,” Alverio says. “We refuse to accept the status quo, and we’re elevating the experience for employees across Puget Sound and Microsoft as a whole.”

Driving AI innovation in optical network resilience

Our journey towards an AI-driven optical network is gaining momentum.

As part of our Secure Future initiative, we’ve automated our Optical Management Platform credential rotation and are actively developing intelligent incident management ticket enrichment, auto-remediation, link provisioning, deployment validation, and capacity planning.

AI plays a central role in this transformation.

With Microsoft 365 Copilot and GitHub Copilot integrated into our engineering workflows, we’re accelerating development cycles, improving code accuracy, and uncovering optimization opportunities that would otherwise take hours of manual effort.

These Copilots are also helping our engineers analyze network patterns, simulate outcomes, and validate deployment logic before execution, reducing human error and strengthening our Zero Trust posture. Over time, we’re evolving toward a system where AI not only assists but proactively predicts potential disruptions, recommends remediations, and continuously learns from operational telemetry.

These advancements are paving the way for a future where our optical infrastructure can anticipate issues, recover faster, and operate with the agility and assurance expected in a Zero Trust environment.

Key takeaways

If you’re considering implementing your own optical and BCDR networks, consider these tips:

  • Understand the technical components of resilience: Independent optical systems, physically independent paths, separate control software, a unified client interface, and survivability by design are the key technical components of true resilience.
  • Plan from a preparedness and value perspective: Evaluate the critical points in your infrastructure and determine where you can get the most value out of resilient connectivity.
  • Ensure your teams have the right skillset: Carefully consider the right workforce to run those systems and be accountable for their operation.

The post Keeping our in-house optical network safe with a Zero Trust mentality appeared first on Inside Track Blog.

]]>
20611
Transforming security and compliance at Microsoft with Windows Hotpatch http://approjects.co.za/?big=insidetrack/blog/transforming-security-and-compliance-at-microsoft-with-windows-hotpatch/ Thu, 02 Oct 2025 16:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20455 Security updates are essential, and every security admin knows that when it comes to applying these updates, faster is better to mitigate the risk. However, security updates have always come with a catch: Windows needs to reboot to apply them. Reboots mean interrupted productivity and downtime for users. For us at Microsoft Digital, Microsoft’s internal […]

The post Transforming security and compliance at Microsoft with Windows Hotpatch appeared first on Inside Track Blog.

]]>
Security updates are essential, and every security admin knows that when it comes to applying these updates, faster is better to mitigate the risk. However, security updates have always come with a catch: Windows needs to reboot to apply them.

Reboots mean interrupted productivity and downtime for users.

For us at Microsoft Digital, Microsoft’s internal IT organization, Windows Hotpatch changes the equation.

It’s a new way to deliver critical Windows updates without rebooting. That means faster compliance, less downtime, and happier users.

We’re using it across Microsoft and it’s already transforming how we think about security and productivity.

“Hotpatch is helping Microsoft reach compliance faster than ever—no reboots, no delays, secure systems at scale, and a seamless experience that keeps users more productive. The risk exposure window is reduced drastically, making our environment safer and more resilient,” says Harshitha Digumarthi, a senior program manager within Microsoft Digital.

Hotpatch installs updates while the system is running—no reboot required. That means we can patch faster, stay compliant, and keep users happy.

And it’s not just us.

Microsoft enterprise customers are already scaling deployments to millions of devices. We’re seeing a shift in how organizations think about patching and how they can expedite the patch time. Hotpatch is here to help. It’s no longer a disruption, it’s just part of the flow.

Increasing productivity and security with Hotpatch

Hotpatch is a servicing technology that delivers cumulative security updates—released on Patch Tuesday, the second Tuesday of each month—without requiring a system reboot. Instead of replacing binaries on disk and restarting the system, Hotpatch modifies in-memory code while the system is running.

This means updates take effect immediately, with no downtime, no maintenance windows, and no disruption to users.

Hotpatch payloads are small by design. Smaller updates mean faster downloads, quicker installs, and minimal impact on performance. CPU usage stays low. No spikes. No slowdowns. Just updates that run in the background and finish silently.

“The experience is so seamless you don’t even know what happened,” says Nevine Geissa, a partner group program manager within the Windows product team. “There are no process restarts, no logging out, no performance impact. No glitch in the video playing or transaction dropping. Everything just works as if nothing has happened.”

Because hotpatch updates happen so painlessly in the background, IT administrators may want to understand how the process works and what validation steps are involved. That’s why we test hotpatch updates with the same rigorous standards we apply to all our security updates.

A photo of Geissa.

“Hotpatch updates go through the exact same validation and rigor that a standard security update goes through. There is no compromise on quality whatsoever. Your device is always as secure as your non-hotpatch device.”

Nevine Geissa, partner group program manager, Windows Servicing and Delivery

Even in cases of zero-day vulnerabilities, Hotpatch can deliver out-of-band updates to enrolled devices without requiring a reboot.

Hotpatch is available for Windows 11 version 24H2 or later, Windows 365, Azure Virtual Desktop, Windows Server 2022/2025 Azure Edition, and Azure Arc connected Windows Server 2025 Datacenter and Standard editions.

The technology has matured over years of internal development.

“Hotpatch updates go through the exact same validation and rigor that a standard security update goes through,” Geissa says. “There is no compromise on quality whatsoever. You will always be at the exact same level of security.”

Hotpatch has evolved and grown.

“It started as internal server capability in Azure and then expanded to our Windows Server 2022 customers,” says Nikita Deshpande, a senior customer experience program manager within the Windows Servicing and Delivery product team at Microsoft. “The tooling and OS support have matured such that now we can offer Hotpatch to AMD64 and Arm64 client machines now too.”

Hotpatch integrates seamlessly with Autopatch, a cloud-based service from Microsoft that automates the process of keeping Windows devices up to date. Designed for enterprise environments, and powered by Microsoft Intune, Autopatch manages updates for Windows, Microsoft 365 Apps for enterprise, Microsoft Edge, and Microsoft Teams, reducing the manual effort required by IT administrators.

Any new policy in our environment created with Autopatch automatically enables Hotpatch—if the device meets requirements. Admins can set up rings, monitor compliance, and roll out updates with just a few clicks.

“It’s the better together story,” Deshpande says. “Autopatch streamlines everything. Add Hotpatch, and it takes Windows Update to a whole new level.”

Implementing Windows Hotpatch internally at Microsoft

The implementation of Hotpatch at Microsoft Digital involved developing and deploying a feature, as well as establishing trust for customers.

The journey started years ago in Azure with virtual machines, then to Windows Server across physical and virtual instances. Now, it’s on Windows 11 clients and scaling fast, but getting here took deep collaboration.

Our team in Microsoft Digital partnered with the product team from the start. We were co-designers with experience in this space. We helped shape the rollout, validate the experience, and make sure Hotpatch was ready for enterprise scale.

Then we scaled. We expanded to 40,000, then 80,000, then 120,000 devices. We’re on track to reach 450,000 devices at Microsoft in the next four months.

We also wanted a great admin experience enabled for the product. The features help with smooth rollout and the visibility helps admins monitor rollouts and measure impact. We’re continually collaborating with the Windows product team to equip administrators with comprehensive insights and actionable recommendations with Hotpatch.

“We worked closely with the product team to make sure admins had the right metrics to measure the success,” Digumarthi says. “It’s not just about implementation—it’s about knowing it worked.”

We ran early adopter programs and insider rings to gather feedback from across Microsoft. That feedback loop helped refine the experience, improve reporting, and ensure the rollout was smooth.

Achieving security without compromising on productivity

Hotpatching is changing how we think about security.

“With Hotpatch, we’re seeing 81% of Microsoft’s enrolled devices become compliant within 24 hours of Patch Tuesday and 90% of enrolled devices are patched within five days.”

Harshitha Digumarthi, senior program manager, Microsoft Digital

Before, it took our team up to nine months to reach 95% compliance for security patching.

That’s nine months of exposure and nine months of risk.

With Hotpatch, we’re achieving 95% compliance in less than three weeks.

“With Hotpatch, we’re seeing 81% of Microsoft’s enrolled devices become compliant within 24 hours of Patch Tuesday, and 90% of enrolled devices are compliant within five days,” Digumarthi says.

That’s not just faster. It’s safer.

“We’re reducing the risk window,” Digumarthi says. “From vulnerability discovery to patch deployment, we’re closing the gap—without disrupting users.”

And it’s not just internal. Since general availability in April, Hotpatch has scaled to over 4.5 million devices globally. That growth shows trust and momentum.

It also shows value. Admins spend less time chasing updates. End users stay productive. And security teams get the compliance they need—without the friction.

“Hotpatching eliminates the trade-off between security and productivity,” Deshpande says. “You don’t have to choose anymore.”

Improving the user experience

Hotpatching doesn’t just improve security—it transforms the user experience.

For end users, it’s invisible.

Updates happen in the background.

No pop-ups. No restarts. No performance hits.

“It’s so seamless,” Geissa says. “There’s no bubble. No prompt. It just works.”

Even the first few times, users might see a green banner letting them know they’ve been hotpatched.

A photo of Selveraj.

“It’s really helpful as an end user; I feel more secure. I don’t need to keep checking and making sure my device is up to date. It just is.”

Senthil Selvaraj, principal group product manager, Microsoft Digital

It’s subtle. It’s clean.

It’s so effective that it’s become a kind of badge among Microsoft insiders.

“It’s really helpful as an end user—I feel more secure,” says Senthil Selvaraj, a principal group product manager at Microsoft Digital. “I don’t need to keep checking and making sure my device is up to date. It just is.”

That’s the magic.

Hotpatching doesn’t interrupt work—it protects it.

It helps other systems stay current too. When the OS is secure, dependent apps and services can update more reliably. That ripple effect improves the overall health of the device.

Admins also see the benefits. Intune reporting shows which devices are ready, which have updated, and which need attention. That visibility helps IT teams track compliance without chasing down machines or relying on manual checks.

For enterprises, it means fewer help desk calls. Fewer complaints. Fewer delays.

Looking forward

Hotpatching is just getting started.

At Microsoft Digital, we’re expanding from 100K to 450K devices in the next four months. That’s nearly every eligible device in our fleet.

Externally, adoption is accelerating. We’ve gone from zero to almost 4.5 million devices since private preview in November 2024. That includes Microsoft and customer fleets, and the number keeps growing.

But scale is just the beginning.

The product team is exploring ways to improve compliance visibility—giving admins deeper insights into patch status, readiness, and impact. That means better reporting, smarter dashboards, and tighter integration with compliance tools.

We’re also working to make adoption easier.

Documentation is improving, Intune reporting is evolving, and we’re building clearer guidance for customers to validate their environments, understand their risk posture, and deploy Hotpatch confidently.

The vision is simple: secure every device, without disruption.

Key takeaways

Here are several key actions you can take to successfully implement Windows Hotpatch in your organization:

  • Check your eligibility and prerequisites. Understand your eligibility and set up the prerequisites in your environment to be hotpatch-capable.
  • Monitor devices and report compliance. Use Intune and other reporting tools to track device readiness, update status, and compliance, even for unmanaged environments.
  • Communicate the benefits to users. Inform users that hotpatching maintains their ability to reboot while enhancing device security with minimal disruption.
  • Deliver a seamless update experience. Emphasize the uninterrupted, restart-free, and performance-neutral nature of updates for users.

The post Transforming security and compliance at Microsoft with Windows Hotpatch appeared first on Inside Track Blog.

]]>
20455
Transforming our VPN with Global Secure Access at Microsoft http://approjects.co.za/?big=insidetrack/blog/transforming-our-vpn-with-global-secure-access-at-microsoft/ Thu, 25 Sep 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20360 Ensuring safe and secure access to resources in the enterprise has always been a delicate balance. Protecting corporate assets from intrusions and misuse is paramount. But a system that neglects usability for employees creates frustration and inefficiencies. At Microsoft, we’re in the midst of a major transformation in how we manage access to our corporate […]

The post Transforming our VPN with Global Secure Access at Microsoft appeared first on Inside Track Blog.

]]>
Ensuring safe and secure access to resources in the enterprise has always been a delicate balance. Protecting corporate assets from intrusions and misuse is paramount. But a system that neglects usability for employees creates frustration and inefficiencies.

At Microsoft, we’re in the midst of a major transformation in how we manage access to our corporate resources. The cornerstone of this change is Microsoft Global Secure Access (GSA), a security service edge (SSE) solution that replaces traditional VPNs with a modern, identity-centric model. GSA provides three core services integrated into a unified framework: Microsoft 365 Access, Internet Access, and Private Access. This approach not only strengthens our enterprise security posture but also simplifies connectivity for both users and administrators.

A photo of Apple.

“Years ago, the concept of a VPN was simple: a single virtual private network gave employees access to the company’s entire internal network. Today, this model presents serious risks.

Pete Apple, principal cloud network engineer, Microsoft Digital

Over 158,000 of our employees are already using the GSA client and Microsoft 365, with full rollout of private and internet access planned in the coming months. Here’s how we’re building a more secure, seamless, and future-ready access experience across Microsoft’s ecosystems.

Beyond VPNs: the future of secure access

The idea that an internal network is inherently safer than the open internet has always been risky, and modern threats make that assumption dangerous. This is why we’ve embraced the Zero Trust model, shifting away from blanket access and moving toward least-privilege access—ensuring users only get what they need, when they need it, and nothing more.

Adopting a Zero Trust approach across the enterprise makes moving beyond traditional VPNs imperative. For years, we’ve relied on Microsoft VPN and Azure VPN to access internal resources. While effective, these traditional models operate on an “all-or-nothing” basis: once connected, employees gain broad access, regardless of role or security context.

“Years ago, the concept of a VPN was simple: a single virtual private network gave employees access to the company’s entire internal network,” says Pete Apple, a principal cloud network engineer in Microsoft Digital, the company’s IT organization. “Today, this model presents serious risks. If a user’s identity or device is compromised—or if a man-in-the-middle attack occurs—the attacker can connect through the VPN and gain broad access to sensitive data, soft targets, and critical systems.”

A photo of Triv.

“One of the primary reasons for this shift to GSA is that we get more granularity within this identity-based security solution that we can control access on a very fine level.”

Gary Triv, principal network engineer, Microsoft Digital

This creates challenges for organizations like ours—and yours.

That’s where GSA can help.

It shifts the paradigm by introducing fine-grained, identity-based controls. Through deep integration with Microsoft Entra, administrators can enforce policies that adapt in real time, ensuring only the right users, devices, and conditions grant access to sensitive resources.

“One of the primary reasons for this shift to GSA is that we get more granularity within this identity-based security solution that we can control access on a very fine level,” says Gary Triv, a principal network engineer in Microsoft Digital.

The four pillars of GSA security

Our focus on security is built into everything we do.

“Conditional access, identity-centric controls, and other core elements of Zero Trust are built directly into the solution,” says Lalitha Mahajan, global technical program manager for Global Secure Access.

At the heart of GSA are four foundational security features:

  1. Conditional Access (CA): Unlike VPNs, which provide blanket access, CA enforces contextual rules to ensure role-appropriate access at all times. For example, an engineer may be allowed access to a security portal, while another user may only see Power BI dashboards.
  2. Continuous Access Evaluation (CAE): Access control doesn’t stop at login. CAE evaluates user context in real time. If an employee’s role changes, their credentials are revoked, or they leave the company, their active sessions are immediately terminated.
  3. Network Filtering: GSA allows administrators to define exactly where users can go on the internet or within corporate networks. This ensures employees have access only to approved destinations, reducing exposure to threats.
  4. Compliant Network (CN): Access is tied to the source network. For instance, a device in Redmond may be allowed, but the same device in an untrusted region could be blocked automatically.

Together, these pillars make GSA a secure and adaptive solution, fully aligned with the principles of Zero Trust.

“With the Zero Trust model, our goal is to enforce least-privilege access. That means locking down internal resources, improving segmentation, and using firewalls and other controls so users can’t reach everything by default,” Apple says. “Instead of relying on a blanket VPN network, we’re moving to the Entra Global Secure Access model, which combines network and identity. Instead of granting broad visibility into the entire internal network, access can now be scoped to a user’s identity—so employees only connect to the resources defined for them.”

A photo of Mahajan.

“Unlike traditional VPNs, GSA delivers both client-side and server-side insights, all of which we own. This gives us deeper visibility and allows us to make the data more actionable for our use cases.”

Lalitha Mahajan, program manager, Microsoft Digital

A perfect example is a Microsoft developer—one of our most common employee roles.

Our developers may need access to specific source code, certain labs, and designated file shares. With GSA, we can grant access only to those resources—and nothing else. This shift from a blanket “once connected, you can see everything” approach, to a tightly defined, identity-based model is a major security improvement and one of the most exciting reasons we’re moving forward with this product.

A key differentiator and critical Zero Trust enabler is GSA’s rich telemetry, which provides real-time visibility into user activity, device health, and network traffic. This continuous stream of data enables early detection of threats, anomaly detection, and precise policy enforcement—strengthening Zero Trust in practice.

“Unlike traditional VPNs, GSA delivers both client-side and server-side insights, all of which we own,” Mahajan says. “This gives us deeper visibility and allows us to make the data more actionable for our use cases.”

The key components of GSA

Private Access is just one of three offerings that make up GSA. Together, these offerings are unified under a single client that creates three dedicated tunnels—one for each service—while administrators centrally define routing and policy rules. GSA consists of:

  • Microsoft 365 Access: Optimized, policy-controlled connectivity for Office apps and services.
  • Internet Access: Secure browsing with TLS inspection, URL filtering, and content controls.
  • Private Access: A modern replacement for legacy VPNs that enable granular access to internal resources.

For Internet Access, GSA supports two deployment models: branch connectivity, where IPSec tunnels secure traffic from devices without a client (like printers), and client connectivity, where the GSA client routes laptop or desktop traffic directly to the GSA Edge. Both approaches enforce consistent policies, differing only in how traffic reaches the framework.

Advanced features and monitoring

Unlike fragmented VPN and firewall logs, GSA provides consistent visibility through unified logging, which consolidates session data—including user identity, device, source, destination, and applied policies—into a single view. We can now easily validate whether security features are working as intended and forward logs to Microsoft Sentinel for extended monitoring.

This holistic view provides us with a major advantage against cyber threats, enabling faster investigations and clearer correlations between user behavior and network activity.

Our rollout of GSA is well underway internally at Microsoft. With more than 158,000 GSA client and Microsoft 365 users already onboard, the next phase will expand private access company-wide, followed by broader adoption of internet access. Early pilots have demonstrated strong results, with positive feedback on both usability and the ability to solve unique access challenges.

By delivering a complete, identity-based secure access solution—spanning Microsoft 365, internet, and private connectivity—Microsoft is redefining enterprise access for the cloud-first era. The result is a future where connectivity is not only seamless but also secure, adaptive, and tightly aligned with user identity and context.

Key takeaways

Our experience transitioning to GSA Private Access has left us with several key insights that other enterprises can apply to their own efforts to modernize remote access:

  • Adopt least-privilege access: Move away from blanket network access to ensure employees only reach the resources they need.
  • Reduce risk from compromised accounts: Limit the blast radius of identity or device breaches by segmenting and scoping access.
  • Continuously evaluate trust: Treat access as dynamic, adapting in real time to changes in user roles, device health, or network conditions.
  • Improve visibility through telemetry: Use detailed activity and traffic data to spot anomalies early and strengthen security decisions.
  • Unify security and connectivity: Align access with identity and context, creating a balance between strong protection and seamless user experience.

The post Transforming our VPN with Global Secure Access at Microsoft appeared first on Inside Track Blog.

]]>
20360
Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure http://approjects.co.za/?big=insidetrack/blog/modernizing-it-infrastructure-at-microsoft-a-cloud-native-journey-with-azure/ Thu, 04 Sep 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=20125 Engage with our experts! Customers or Microsoft account team representatives from Fortune 500 companies are welcome to request a virtual engagement on this topic with experts from our Microsoft Digital team. At Microsoft, we are proudly a cloud-first organization: Today, 98% of our IT infrastructure—which serves more than 200,000 employees and incorporates over 750,000 managed […]

The post Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure appeared first on Inside Track Blog.

]]>

Engage with our experts!

Customers or Microsoft account team representatives from Fortune 500 companies are welcome to request a virtual engagement on this topic with experts from our Microsoft Digital team.

At Microsoft, we are proudly a cloud-first organization: Today, 98% of our IT infrastructure—which serves more than 200,000 employees and incorporates over 750,000 managed devices—runs on the Microsoft Azure cloud.

The company’s massive transition from traditional datacenters to a cloud-native infrastructure on Azure has fundamentally reshaped our IT operations. By adopting a cloud-first, DevOps-driven model, we’ve realized significant gains in agility, scalability, reliability, operational efficiency, and cost savings.

“We’ve created a customer-focused, self-serve management environment centered around Azure DevOps and modern engineering principles,” says Pete Apple, a technical program manager and cloud architect in Microsoft Digital, the company’s IT organization. “It has really transformed how we do IT at Microsoft.”

“Our service teams don’t have to worry about the operating system. They just go to a website, fill in their info, add their data, and away they go. That’s a big advantage in terms of flexibility.”

Apple is shown in a portrait photo.
Pete Apple, technical program manager and cloud architect, Microsoft Digital

What it means to move from the datacenter to the cloud

Historically, our IT environment was anchored in centralized, on-premises datacenters. The initial phase of our cloud transition involved a lift-and-shift approach, migrating workloads to Azure’s infrastructure as a service (IaaS) offerings. Over time, the company evolved toward more of a decentralized, platform as a service (PaaS) DevOps model.

“In the last six or seven years we’ve seen a lot more focus on PaaS and serverless offerings,” says Faisal Nasir, a principal architect in Microsoft Digital. “The evolution is also marked by extensibility—the ability to create enterprise-grade applications in the cloud—and how we can design well-architected end-to-end services.”

Because we’ve moved nearly all our systems to the cloud, we have a very high level of visibility into our network operations, according to Nasir. We can now leverage Azure’s native observability platforms, extending them to enable end-to-end monitoring, debugging, and data collection on service usage and performance. This capability supports high-quality operations and continuous improvement of cloud services.

“Observability means having complete oversight in terms of monitoring, assessments, compliance, and actionability,” Nasir says. “It’s about being able to see across all aspects of our systems and our environments, and even from a customer lens.”

Decentralizing our IT services with Azure

As Microsoft was becoming a cloud-first organization, the nature of the cloud and how we use it changed. As Microsoft Azure matured and more of our infrastructure and services moved to the cloud, we began to move away from IT-owned applications and services.

The strengths of the Azure self-service and management features means that individual business groups can handle many of the duties that Microsoft Digital formerly offered as an IT service provider—which enables each group to build agile solutions to match their specific needs.

“Our goal with our modern cloud infrastructure continues to be a solution that transforms IT tasks into self-service, native cloud solutions for monitoring, management, backup, and security across our entire environment,” Apple says. “This way, our business groups and service lines have reliable, standardized management tools, and we can still maintain control over and visibility into security and compliance for our entire organization.”

The benefits to our businesses of this decentralized model of IT services include:

  • Empowered, flexible DevOps teams
  • A native cloud experience: subscription owners can use features as soon as they’re available
  • Freedom to choose from marketplace solutions
  • Minimal subscription limit issues
  • Greater control over groups and permissions
  • Better insights into Microsoft Azure provisioning and subscriptions
  • Business group ownership of billing and capacity management

“With the PaaS model, and SaaS (software as a service), it’s more DIY,” Apple says. “Our service teams don’t have to worry about the operating system. They just go to a website, fill in their info, add their data, and away they go. That’s a big advantage in terms of flexibility.”

“The idea of centralized monitoring is gone. The new approach is that service teams monitor their own applications, and they know best how to do that.”

Delamarter is shown in a portrait photo.
Cory Delamarter, principal software engineering manager, Microsoft Digital

Leveraging the power of Azure Monitor

Microsoft Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to monitoring data from cloud and on-premises environments. Across Microsoft, we use Azure Monitor to ensure the highest level of reliability for our services and applications.

Specifically, we rely on Azure Monitor to:

Create visibility. There’s instant access to fundamental metrics, alerts, and notifications across core Azure services for all business units. Azure Monitor also covers production and non-production environments as well as native monitoring support across Microsoft Azure DevOps.

Provide insight. Business groups and service lines can view rich analytics and diagnostics across applications and their compute, storage, and network resources, including anomaly detection and proactive alerting.

Enable optimization. Monitoring results help our business groups and service lines understand how users are engaging with their applications, identify sticking points, develop cohorts, and optimize the business impact of their solutions.

Deliver extensibility. Azure Monitor is designed for extensibility to enable support for custom event ingestion and broader analytics scenarios.

Because we’ve moved to a decentralized IT model, much of the monitoring work has moved to the service team level as well.

“The idea of centralized monitoring is gone,” says Cory Delamarter, a principal software engineering manager in Microsoft Digital. “The new approach is that service teams monitor their own applications, and they know best how to do that.”

Patching and updating, simplified

Moving our operations to the cloud also means a simpler and more automated approach to patching and updating. The shift to PaaS and serverless networking has allowed us to manage infrastructure patching centrally, which is much more scalable and efficient. The extensibility of our cloud platforms reduces integration complexity and accelerates deployment.

“It depends on the model you’re using,” Nasir says. “With the PaaS and serverless networks, the service teams don’t need to worry about patching. With hybrid infrastructure systems, being in the cloud helps with automation of patching and updating. There’s a lot of reusable automation layers that help us build end-to-end patching processes in a faster and more reliable manner.”

Apple stresses the flexibility that this offers across a large organization when it comes to allowing teams to choose how they do their patching and updating.

“In the datacenter days, we ran our own centralized patching service, and we picked the patching windows for the entire company,” Apple says. “By moving to more automated self-service, we provide the tools and the teams can pick their own patching windows. That also allowed us to have better conversations, asking the teams if they want to keep doing the patching or if they want to move up the stack and hand it off to us. So, we continue to empower the service teams to do more and give them that flexibility.”

Securing our infrastructure in a cloud-first environment

As security has become an absolute priority for Microsoft, it’s also been a foundational element of our cloud strategy.

Being a cloud-first company has made it easier to be a security-first organization as well.

“The cloud enables us to embed security by design into everything we build,” Nasir says. “At enterprise scale, adopting Zero Trust and strong governance becomes seamless, with controls engineered in from the start, not retrofitted later. That same foundation also prepares us for an AI-first future, where resilience, compliance, and automation are built into every system.”

Cloud-native security features combined with integrated observability allow for better compliance and risk management. Delamarter agrees that the cloud has had huge benefits when it comes to enhancing network security.

“Our code lives in repositories now, and so there’s a tremendous amount of security governance that we’ve shifted upstream, which is huge,” Delamarter says. “There are studies that show that the earlier you can find defects and address them, the less expensive they are to deal with. We’re able to catch security issues much earlier than before.”

“There are less and less manual actions required, and we’re automating a lot of business processes. It basically gives us a huge scale of automation on top of the cloud.”

Nasir is shown in a portrait photo.
Faisal Nasir, principal architect, Microsoft Digital

We use Azure Policy, which helps enforce organizational standards and assess compliance at scale using dashboards and other monitoring tools.

“Azure Policy was a key part of our security approach, because it essentially offers guardrails—a set of rules that says, ‘Here’s the defaults you must use,’” Apple says. “You have to use a strong password, for example, and it has to be tied to an Azure Active Directory ID. We can dictate really strong standards for everything and mandate that all our service teams follow these rules.”

AI-driven operations in the cloud

Just like its impact on the rest of the technology world, AI is in the process of transforming infrastructure management at Microsoft. Tasks that used to be manual and laborious are being automated in many areas of the company, including network operations.

“AI is creating a new interface of agents that allow users to interact with large ecosystems of applications, and there’s much easier and more scalable integration,” says Nasir. “There are less and less manual actions required, and we’re automating a lot of business processes. Microsoft 365 Copilot, Security Copilot, and other AI tools are giving us shared compute and extensibility to produce different agents. It basically gives us a huge scale of automation on top of the cloud.”

Apple notes that powerful AI tools can be combined with the incredible amount of data that the Microsoft IT infrastructure generates to gain insights that simply weren’t possible before.

“We can integrate AI with our infrastructure data lakes and use tools like Network Copilot to query the data using natural language,” Apple says. “I can ask questions like, ‘How many of our virtual machines need to be patched?’ and get an answer. It’s early, and we’re still experimenting, but the potential to interact with this data in a more automated fashion is exciting.”

Ultimately, Microsoft has become a cloud-first company, and that has allowed us to work toward an AI-first mentality in everything we do.

“Having a complete observability strategy across our infrastructure modernization helps us to make sure that whatever changes we’re making, we have a design-first approach and a cloud-first mindset,” Nasir says. “And now that focus is shifting towards an AI-first mindset as well.”

Key takeaways

Here are some of the benefits we’ve accrued by becoming a cloud-first IT organization at Microsoft:

  • Transformed operations: By moving from our legacy on-premises datacenters, through Azure’s infrastructure as a service (IaaS) offerings, and eventually to a platform as a service (PaaS) DevOps model, we’ve reaped great gains in reliability, efficiency, scalability, and cost savings.
  • A clear view: With 98% of our organization’s IT infrastructure running in the Azure cloud, we have a huge level of observability into our systems—complete oversight into network assessment, monitoring, compliance, patching/updating, and many other aspects of operations.
  • Empowered teams: Operating a cloud-first environment allows us to have a more decentralized approach to IT infrastructure. This means we can offer our business groups and service lines more self-service, cloud-native solutions for monitoring, management, patching, and backup while still maintaining control over and visibility into security and compliance for our entire organization.
  • Seamless updates: The shift to PaaS and serverless networking has enabled a more planned and automated approach to patching and updating our infrastructure, which produces greater efficiency, integration, and speed of deployment.
  • Dependable security: Our cloud environment has allowed us to implement security by design, including tighter control over code repositories and the use of standard security policies across the organization with Azure Policy.
  • Future-proof infrastructure: As we shift to an AI-first mindset across Microsoft, we’re using AI-driven tools to enhance and maintain our native cloud infrastructure and adopt new workflows that will continue to reap dividends for our employees and our organization.  

The post Modernizing IT infrastructure at Microsoft: A cloud-native journey with Azure appeared first on Inside Track Blog.

]]>
20125
Smarter labs, faster fixes: How we’re using AI to provision our virtual labs more effectively http://approjects.co.za/?big=insidetrack/blog/smarter-labs-faster-fixes-how-were-using-ai-to-provision-our-virtual-labs-more-effectively/ Thu, 24 Jul 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=19628 Microsoft Digital stories Providing technical support at an enterprise of our size here at Microsoft is a constant balancing act between speed, quality, and scalability. Systems grow more complex, user expectations continue to rise, and traditional support models often struggle to keep up. Beyond the usual apps and systems everyone uses, many of our employees […]

The post Smarter labs, faster fixes: How we’re using AI to provision our virtual labs more effectively appeared first on Inside Track Blog.

]]>

Microsoft Digital stories

Providing technical support at an enterprise of our size here at Microsoft is a constant balancing act between speed, quality, and scalability. Systems grow more complex, user expectations continue to rise, and traditional support models often struggle to keep up. Beyond the usual apps and systems everyone uses, many of our employees require virtual provisioning for diverse tasks in many of our businesses. Supporting these virtualized environments is a special challenge.

To meet the growing demand for virtual lab usage across the organization, we turned to AI, not just to automate support responses but to fundamentally rethink how user issues are identified and resolved. This vision came to life through the MyWorkspace platform, where we in Microsoft Digital, the company’s IT organization, introduced a domain-specific AI assistant to streamline how we empower our employees to deploy new virtual labs.

The results have been dramatic: what was once a slow, manual process is now fast, efficient, and frictionless.

But the benefits extend beyond faster resolution times. This transformation represents a new approach to enterprise support—one that uses AI not just as a tool for efficiency, but as a strategic enabler. By embedding intelligence into the support experience, we’re turning complexity into a competitive advantage.

Scaling support in a high-demand environment

MyWorkspace is our internal platform for provisioning virtual labs. Originally developed to support internal testing, diagnostics, and development environments, it has since grown into a critical resource used by thousands of engineers and support personnel across the company.

Scaling the platform infrastructure was straightforward—adding capacity for tens of thousands of virtual labs was a technical challenge we could solve with ease, thanks to our Microsoft Azure backbone. As usage grew, the real strain didn’t show up in CPU load or storage limit, but rather in the support queue—every few months, a new wave of users was onboarded to MyWorkspace: partner teams, internal engineers, and external vendors. These new users, often with minimal experience of the platform, needed fast access and guidance from support.

The questions, though simple, piled up quickly.

Several Tier 1 support engineers repeatedly encountered the same questions from users, such as how to start a lab, what an error meant, and which lab to use for a particular test. These weren’t complex technical issues—they were basic, repetitive onboarding questions that represented a huge opportunity to introduce automation.

“We also found that there were a lot of users who found more niche issues, and those issues had been solved either by our community or by ourselves. In fact, we had a dedicated Teams channel specific to customer issues, and we found that there was a lot of repetition and that other customers were facing similar issues, and we did have a bit of a knowledge base in terms of how to solve these issues.”

A photo of Deans.
Joshua Deans, software engineer, Microsoft Digital

Unblocking a bottleneck with AI

Our support team set out to tackle a familiar but costly challenge: high volumes of low-complexity tickets that consumed valuable time without delivering meaningful impact. Instead of treating this as an unavoidable burden, we saw an opportunity to turn it into a self-scaling solution. If the same questions were being asked repeatedly—and the answers already existed in documentation, internal threads, or institutional knowledge—then an AI system should be able to surface those answers instantly, without human intervention.

“We also found that there were a lot of users who found more niche issues, and those issues had been solved either by our community or by ourselves,” says Joshua Deans, a software engineer within Microsoft Digital. “In fact, we had a dedicated Teams channel specific to customer issues, and we found that there was a lot of repetition and that other customers were facing similar issues, and we did have a bit of a knowledge base in terms of how to solve these issues.”

That insight led the MyWorkspace team to begin building what would become a transformative AI assistant: an automated support layer purpose-built for the MyWorkspace platform. Unlike traditional chatbots that rely on scripted responses or rigid decision trees, this assistant would leverage generative AI trained on a rich dataset of real-world support conversations, internal FAQs, and official documentation.

“So that’s where we found this opportunity to turn this scaling challenge into a scaling advantage, with help from AI. We took all those historical conversations of tier one staff helping new users—trained our AI to provide user education based on that training—and saved our Tier 1 staff from answering potential tickets.”

Vikram Dadwal, principal software engineering manager, Microsoft Digital

The result was a context-aware, responsive system capable of resolving common issues in seconds—not hours or days—dramatically easing the load on support teams while improving the user experience.

Built on Azure and Semantic Kernel

MyWorkspace’s core infrastructure is fully built on Azure services. At any given moment, it manages tens of thousands of virtual machines, scaling up and down with demand. That elasticity, combined with our internal developer tooling and AI orchestration capabilities, provided the perfect environment for an AI-powered support layer.

“So that’s where we found this opportunity to turn this scaling challenge into a scaling advantage, with help from AI,” says Vikram Dadwal, a principal software engineering manager within Microsoft Digital. “We took all those historical conversations of tier one staff helping new users—trained our AI to provide user education based on that training—and saved our Tier 1 staff from answering potential tickets.”

To build the assistant, the team used our Microsoft open-source framework, Semantic Kernel. Designed for generative AI integration, Semantic Kernel allows engineers to create prompt-driven, modular systems that can interact with large language models (LLM) without vendor lock-in.

This approach gave the team several advantages:

  • Flexibility in choosing and switching between LLM providers.
  • Fine-grained control over how prompts were structured and updated.
  • Extensibility through plugins and actions that tie the assistant into the broader ecosystem.

Crucially, the assistant was designed to be part of the platform’s architecture, capable of operating at the same level of scale and responsiveness as the labs it supported. Also, the assistant was initialized with a well-scoped system prompt, limiting its responses strictly to the MyWorkspace domain.

“On average, we measured these interactions at around 20 minutes from ticket submission to problem resolution. Now compare that with a 30-second AI interaction for resolving the same class of issues—that’s a 98% reduction in resolution time, a number we’ve validated with our support team and continue to track.”

Nathan Prentice, senior product manager, Microsoft Digital

Shifting from tickets to conversations

Whether users had questions about lab types, needed clarification on configuration details, or sought guidance during onboarding, the AI provided accurate, interactive responses without requiring human escalation. The experience was both faster and significantly better. Support engineers saw a noticeable reduction in repeat tickets, as common issues were resolved on the spot. Onboarding friction decreased, and users were confident that they could get the answers they needed instantly—no ticket, no delay, no need to track a support contact.

“On average, we measured these interactions at around 20 minutes from ticket submission to problem resolution,” says Nathan Prentice, a senior product manager within Microsoft Digital. “Now compare that with a 30-second AI interaction for resolving the same class of issues—that’s a 98% reduction in resolution time, a number we’ve validated with our support team and continue to track.”

Smart, interactive, and intuitive

Our Microsoft Digital team has recently implemented a new version of the MyWorkspace AI assistant that includes several major enhancements. The assistant now features adaptive cards, polished layouts, and a Microsoft 365 Copilot-aligned user experience, making it feel familiar and trustworthy for internal teams. The assistant can now distinguish between a question and an action. If a user says, “Start a SharePoint lab,” it responds with an interactive card and begins provisioning, bridging the gap between passive support and active enablement.

“One of the primary bottlenecks we previously faced in creating an AI solution to address frequently asked user questions was the lack of technology capable of generating accurate answers for complex technical queries and understanding nuanced user input. With the availability of Azure OpenAI models, we were able to effectively overcome this challenge, enabling our AI solution to deliver precise and context-aware responses at scale.”

A photo of Nair.
Anjali Nair, senior software engineer, Microsoft Digital

To guide our employees and improve discoverability, the assistant offers recommended prompts—just like Copilot does—helping new users understand what they can ask and how to get started.

Users can now rate responses, giving a thumbs up or down. These signals are aggregated and reviewed by the engineering team, ensuring continuous improvement and fine tuning over time.

Intelligent provisioning with multi-agent orchestration 

At Microsoft Digital, we’re reimagining how labs are provisioned by integrating AI-driven intelligence into the process. Traditionally, users are expected to know exactly what kind of lab environment they need. But in complex virtualization and troubleshooting scenarios, these assumptions often fall short. Should a user troubleshooting hybrid issues with Microsoft Exchange spin up a basic Exchange lab, or one that includes Azure AD integration, conditional access policies, and hybrid connectors? To eliminate this guesswork, our team is building a multi-agent system powered by the Semantic Kernel SDK multi-agent framework. This system interprets the user’s support context—often expressed in natural language—and automatically provisions the most relevant lab environment.

For example, a user might say, “I’m seeing sync issues between SharePoint Online and on-prem,” and the assistant will orchestrate the creation of a tailored lab that replicates that exact scenario, enabling faster diagnosis and resolution. With agent orchestration, each agent in the system is specialized: one might handle context interpretation, another lab configuration, and another cost optimization. These agents collaborate to ensure that the lab not only meets technical requirements but is also cost-effective. By leveraging telemetry and historical usage data, the system can recommend leaner configurations—such as using ephemeral VMs, auto-pausing idle resources, or selecting lower-cost SKUs—without compromising diagnostic fidelity. This intelligent provisioning framework is designed to scale, adapt, and continuously learn from usage patterns.

“One of the primary bottlenecks we previously faced in creating an AI solution to address frequently asked user questions was the lack of technology capable of generating accurate answers for complex technical queries and understanding nuanced user input,” says Anjali Nair, a senior software engineer within Microsoft Digital. “With the availability of Azure OpenAI models, we were able to effectively overcome this challenge, enabling our AI solution to deliver precise and context-aware responses at scale.”

With multi-agent orchestration, we’re taking a step towards a future where lab environments are not just automated, but intelligently orchestrated, context-aware, and cost-optimized—empowering engineers to focus on solving problems, not setting up infrastructure.

Scaling support without scaling headcount

The MyWorkspace assistant is a powerful example of how enterprise support can evolve through intelligence. By embedding AI into the support experience, we’ve turned complexity into a competitive edge—reshaping knowledge work and operations through AI’s problem-solving capabilities. As Microsoft advances as a Frontier Firm, MyWorkspace shows how we can scale support on demand, with intelligence built in. Routine queries are offloaded to AI, freeing Tier 1 teams to focus on critical issues and giving Tier 2 engineers space to innovate. Most importantly, support now scales with user demand—not headcount.

But this system does more than just respond—it learns. Every interaction becomes a data point. Each resolved issue feeds back to the assistant, sharpening its accuracy and expanding its knowledge. What started as a reactive Q&A tool is now growing into a proactive orchestrator that surfaces insights and points users to solutions, resolving issues before they ever become tickets.

“We have a lot more telemetry now, so users can provide feedback to our responses—for example, thumbs up or thumbs down feedback,” Deans says. “And we can actually view where the model is giving incorrect or inappropriate information, and we can use that to make adjustments to the prompt provided to the model.”

In this model, support becomes a seamless extension of the user experience. With the right AI architecture in place, it transforms a cost center into a strategic asset. The MyWorkspace assistant fulfills its role as an embedded, intelligent teammate—delivering answers, driving actions, and continuously improving over time.

Ultimately, our journey with MyWorkspace shows that meaningful AI adoption doesn’t have to begin with sweeping transformation. Sometimes, it starts with a helpdesk queue, a recurring issue, and the choice to build something smarter—something that learns, adapts, and empowers at every step.

Key takeaways

Here are some of our top insights from boosting our internal deployment of MyWorkspace with AI and continuous improvement.

  • Start small and specific. Focus on a defined domain—like MyWorkspace—and use existing support logs to train your assistant.
  • Invest in AI infrastructure. Tools like Semantic Kernel provide flexibility, especially in enterprise settings where vendor neutrality and customization matter.
  • Design for trust. Align your assistant’s UI with well-known systems like Microsoft Copilot to build user confidence.
  • Don’t wait for perfection. Launch a V1, gather feedback, and make improvements. AI assistants get better over time if you let them learn.
  • Think outside the ticket queue. The future isn’t just faster support—it’s intelligent, anticipatory systems that eliminate friction before it begins.

The post Smarter labs, faster fixes: How we’re using AI to provision our virtual labs more effectively appeared first on Inside Track Blog.

]]>
19628
Securing the borderless enterprise: How we’re using AI to reinvent our network security http://approjects.co.za/?big=insidetrack/blog/securing-the-borderless-enterprise-how-were-using-ai-to-reinvent-our-network-security/ Thu, 10 Jul 2025 16:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=19504 The modern enterprise network is complex, to say the least. Enterprises like ours are increasingly adopting hybrid infrastructures that span on-premises data centers, multiple cloud environments, and a diverse array of remote users. In this context, traditional security tools are still playing checkers while the malicious actors are playing chess. To make matters worse, attacks […]

The post Securing the borderless enterprise: How we’re using AI to reinvent our network security appeared first on Inside Track Blog.

]]>
The modern enterprise network is complex, to say the least.

Enterprises like ours are increasingly adopting hybrid infrastructures that span on-premises data centers, multiple cloud environments, and a diverse array of remote users. In this context, traditional security tools are still playing checkers while the malicious actors are playing chess. To make matters worse, attacks are increasingly enabled by AI tools.

That’s why here in Microsoft Digital, the company’s IT organization, we’re using a modern approach and toolset—including AI—to secure our network environment, turning complexity into clarity, one approach, tool, and insight at a time.

Leaving traditional network security behind

For years, traditional network security relied on a simple but increasingly outdated assumption: everything inside the corporate perimeter can be trusted. This model made sense when networks were static, users were on-premises, and applications lived in a centralized data center.

But that world is gone.

A photo of Venkatraman.

“Implicit trust must be replaced with explicit verification. That means rethinking how we monitor, how we respond, and how we design for resilience from the start.”

Raghavendran Venkatraman, principal cloud network engineering manager, Microsoft Digital

Today’s enterprise is dynamic, decentralized, and borderless. Hybrid work has become the norm. Cloud adoption is accelerating. Teams are globally distributed. Devices and data move constantly across environments. In this new reality, the network perimeter hasn’t just shifted—it has effectively vanished.

That’s where the cracks in legacy security models become impossible to ignore.

Visibility becomes fragmented. Security teams struggle to track what’s happening across a sprawling digital estate. Traditional monitoring tools focus on infrastructure uptime or device health—not on the actual experience of the people using the network. That disconnect creates blind spots, and blind spots create risk.

We know that this model no longer meets the needs of a modern, AI-powered enterprise. Every enterprise needs a new approach—one that assumes breach, enforces least-privilege access, and continuously verifies trust.

“Implicit trust must be replaced with explicit verification,” says Raghavendran Venkatraman, a principal cloud network engineering manager in Microsoft Digital. “That means rethinking how we monitor, how we respond, and how we design for resilience from the start.”

This shift is foundational to our security strategy. It’s not just about securing infrastructure—it’s about securing the experience. Because in a world where users, data, and threats are everywhere, trust has to be proved, not assumed.

Building a resilient and adaptive security strategy

To secure hybrid corporate networks effectively, organizations must go beyond traditional perimeter defenses. They need a comprehensive and adaptive security strategy—one that evolves with the threat landscape and aligns with the complexity of modern enterprise environments. The diversity of hybrid networks introduces new vulnerabilities and expands the attack surface. A static, one-size-fits-all approach simply doesn’t work anymore.

At Microsoft Digital, we’ve embraced a layered, cloud-first security model that integrates identity, access, encryption, and monitoring across every layer of the network. It’s embedded in everything we do. This model includes these key strategies, which we’ll expand upon in the following sections:

  • Adopting Zero Trust principles
  • Establishing identity as the new perimeter 
  • Integrating AI and machine learning
  • Enforcing network segmentation
  • Embracing continuous monitoring

Adopting Zero Trust principles

Zero Trust Architecture (ZTA) operates on a strict principle: “never trust, always verify.” That means no user, device, or application—whether it’s inside or outside the corporate network—is inherently trusted as they are in the traditional network security model.

A photo of McCleery.

“Zero Trust isn’t a product—it’s a mindset. It’s about assuming breach and designing defenses that minimize impact and maximize resilience.”

Tom McCleery, principal group cloud network engineer, Microsoft Digital

Every access request is evaluated against dynamic policies. These policies consider several factors—like user identity, device health, location, and how sensitive the data being accessed is. For example, if an employee tries to access a financial report from a corporate laptop at the office, they might get in, no problem. But that same request from a personal device in another country could get blocked or trigger extra authentication steps.

At the heart of ZTA are policy enforcement points that authorize every data flow. These checkpoints only grant access when all conditions are met, and they log every interaction for auditing and threat detection. This kind of granular control reduces the attack surface and limits lateral movement if there is a breach.

Adopting Zero Trust isn’t just a technical upgrade—it’s a strategic must. It boosts an organization’s ability to defend against modern threats like ransomware, insider attacks, and supply chain compromises.

“Zero Trust isn’t a product—it’s a mindset,” says Tom McCleery, a principal group cloud network engineer in Microsoft Digital. “It’s about assuming breach and designing defenses that minimize impact and maximize resilience.”

By embracing Zero Trust, we strengthen our security posture, lowers the risk of data breaches, and responds more effectively to emerging threats.

Establishing identity as the new perimeter

Identity is no longer just a component of security—it has become the new perimeter. Traditional security models focused on defending the network edge, assuming that everything inside the perimeter could be trusted. But in today’s hybrid and cloud-first environments, the perimeter has dissolved and that assumption is outdated and dangerous. Users, devices, and applications now operate across diverse locations and platforms, making perimeter-based defenses insufficient.

Identity-first security shifts the focus from securing the physical network to securing the identities—both human and machine—that interact with the network. This means every access request is treated as though it originates from an untrusted source, regardless of where it comes from. Whether it’s a remote employee logging in from a personal device or an automated workload accessing cloud resources, the system must verify who or what is making the request, assess the risk, and enforce least-privilege access across the user experience.

This approach enables organizations to implement more granular access controls. For example, a developer might be allowed to access a code repository but not production systems, and only during business hours from a managed device.  Similarly, a service account used by a Continuous Integration and Continuous Deployment CI/CD pipeline might be restricted to specific APIs and monitored for anomalous behavior. A CI/CD pipeline is an automated workflow that takes code from development through testing and into production.

By anchoring network security around verified identities, organizations reduce their attack surface and improve their ability to detect and respond to threats. This identity-centric model is not just a security enhancement—it’s a strategic shift that aligns with how modern enterprises operate.

Integrating AI and machine learning 

AI and machine learning (ML) are foundational pillars in our network security strategy. Intelligent automation and advanced analytics help us not only detect and respond to threats, but also continuously improve our security posture in an ever-changing landscape. Here’s how we’re using AI and ML in some critical aspects of our approach to modern network security:

  • Threat detection and intelligence. We deploy AI-powered monitoring tools that sift through billions of network signals and logs across our hybrid infrastructure. By applying sophisticated ML algorithms, we can identify abnormal behaviors such as unusual login attempts or unexpected data transfers that could indicate a potential breach. These insights allow our security teams to focus on the most critical alerts, reducing noise and accelerating incident investigation.
  • Automated response and containment. Through automation, our security systems can respond to threats in real time. For example, if our AI models detect suspicious activity on a device, automated workflows can immediately isolate the affected endpoint, block malicious traffic, or revoke access privileges, all without waiting for manual intervention. This rapid response capability is essential for minimizing the potential impact of attacks and protecting our critical assets.
  • Predictive analysis and proactive defense. We use predictive analytics to forecast emerging vulnerabilities before they can be exploited. By continuously training our models on the latest threat intelligence and attack patterns, we can anticipate risks and strengthen our defenses proactively—whether that means patching vulnerable systems, adjusting access controls, or updating our security policies.
  • User experience monitoring. We use AI to assess the real experience of our users, a critical measurement in a network environment where identity is the perimeter. By correlating performance metrics with security signals, we ensure that our security mechanisms don’t degrade productivity and that any anomalies impacting user experience are promptly addressed.
  • Continuous learning and improvement. Our AI and ML systems are designed to learn from every incident, adapt to new attack techniques, and evolve with the threat landscape. This continuous improvement loop enables our teams to stay ahead of sophisticated adversaries and maintain robust, resilient network security.

Advanced threats require advanced responses. By integrating AI and ML into our network security strategies, we’re enhancing our ability to detect and respond to threats swiftly, minimize potential damage, and foster a secure environment for innovation and collaboration across our global hybrid infrastructure.

Isolating networks to minimize risk

In a hybrid infrastructure, isolating network segments is a foundational security principle. By segmenting networks, we limit the scope of potential breaches and reduce the risk of lateral movement by attackers. For example, separating employee productivity networks from customer-facing systems ensures that if a vulnerability is exploited in one area, it doesn’t cascade across the entire environment.

This is especially critical in environments where sensitive customer data and internal development systems coexist. Our testing and development environments must remain completely isolated—not only from customer-facing services but also from internal productivity tools like email, collaboration platforms, and identity systems. This prevents test code or experimental configurations from inadvertently exposing production systems to risk.

We also establish policy enforcement points (PEPs) within each network segment. These act as control gates, inspecting and filtering traffic between zones. By placing PEPs at strategic boundaries, we can tightly control what moves between segments and detect anomalies early. This architecture ensures that, if a breach occurs, the “blast radius”—the scope of impact—is minimal and contained.

This layered approach to segmentation and isolation is essential for maintaining the integrity of our production systems, minimizing risk, and ensuring that our hybrid infrastructure remains resilient in the face of evolving threats.

Embracing continuous monitoring 

We’ve stopped thinking of monitoring as a one-time check. Now, it’s a continuous conversation with our network.

A photo of Singh.

“Conventional network performance monitoring—monitoring the systems and infrastructure that support our network—can only tell part of the story. To truly understand and meet our requirements, we must monitor user experiences directly.”

Ragini Singh, partner group engineering manager in Microsoft Digital

Continuous monitoring is how we stay ahead of issues before they impact our people. It’s how we keep our hybrid infrastructure resilient, performant, and secure—every second of every day.

We’ve built a monitoring ecosystem that spans our entire global network from on-premises offices to cloud-based services in Azure and software-as-a-service (SaaS) platforms. With the mindset that identity is the new perimeter, we’re using signals from all aspects of our environment and focusing on the user experience.

“Conventional network performance monitoring—monitoring the systems and infrastructure that support our network—can only tell part of the story,” says Ragini Singh, a partner group engineering manager in Microsoft Digital. “To truly understand and meet our requirements, we must monitor user experiences directly.”

This isn’t just about tools and dashboards. It’s about insight. We’re using synthetic and native metrics to build a hop-by-hop view of the user experience. That lets us pinpoint where things go wrong—and fix them fast. We’re even layering in automation to enable self-healing responses when thresholds are breached.

Continuous monitoring is a strategic shift that helps us protect our people, power our services, and deliver the seamless experience our employees expect.

Looking to the future

As enterprises continue to navigate the complexities of hybrid infrastructures, securing enterprise networks requires an agile, multifaceted approach that integrates Zero Trust principles, identity-first security, and advanced technologies like AI and ML. By shifting the focus from traditional perimeter defenses to a more holistic and adaptive security model, organizations can better protect their assets, maintain operational continuity, and foster innovation in an increasingly interconnected world.

Implementing these strategies not only enhances security but also positions organizations to leverage the full potential of their hybrid infrastructures, driving growth and success in the digital age.

Key takeaways

Here are five key actions you can take to strengthen your organization’s network security and embrace a modern approach to network security:

  • Adopt an identity-first security model. Shift your focus from traditional perimeter-based defenses to verifying and securing every user and device identity—regardless of location or network.
  • Integrate AI and machine learning into your security strategy. Continuously improve your security posture by using intelligent automation and analytics to detect, respond to, and predict threats more effectively.
  • Isolate network segments to minimize risk. Separate critical business functions, customer-facing services, and development environments to contain threats and ensure that any potential breach remains limited in scope.
  • Implement continuous monitoring across your hybrid infrastructure. Move beyond periodic checks by establishing real-time, user-centric monitoring to maintain resilience, performance, and rapid incident response.
  • Embrace a proactive, adaptive mindset. Regularly update your security policies, train your teams, and stay agile to address emerging threats and support innovation as your organization evolves.

The post Securing the borderless enterprise: How we’re using AI to reinvent our network security appeared first on Inside Track Blog.

]]>
19504
Five principles that guided our network journey to Microsoft Azure and the cloud at Microsoft http://approjects.co.za/?big=insidetrack/blog/five-principles-that-guided-our-network-journey-to-microsoft-azure-and-the-cloud-at-microsoft/ Thu, 19 Jun 2025 16:05:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=19387 At Microsoft, we operate one of the world’s largest IT infrastructures. So, when we embarked on the journey nearly a decade ago to move from a primarily on-premises network of physical servers to one that now operates almost entirely in the Azure cloud, it was a mammoth undertaking. And like all long and rewarding journeys, […]

The post Five principles that guided our network journey to Microsoft Azure and the cloud at Microsoft appeared first on Inside Track Blog.

]]>
Microsoft digital stories

At Microsoft, we operate one of the world’s largest IT infrastructures. So, when we embarked on the journey nearly a decade ago to move from a primarily on-premises network of physical servers to one that now operates almost entirely in the Azure cloud, it was a mammoth undertaking.

And like all long and rewarding journeys, this one led to many important insights. We’d like to share five overarching principles that we learned along the way with our customers, most of whom are somewhere in the midst of their own organizational transformation into a cloud-first company, or who may be contemplating such a move.

By delineating our guiding principles and major takeaways from our own journey to the cloud, we at Microsoft Digital—the company’s IT organization—hope that other companies can learn from our experience and have a smoother and more efficient transition of their own, saving time, money, and effort.

“Our customers can learn from us having gone through it,” says Pete Apple, a technical program manager and cloud architect in Microsoft Digital. “Because we didn’t do it right the first time, at all. And so that learning process of, ‘This is what we did, this is how we did it, this is what you should think about’ can help them consider their own options.”

Stages in our journey to the cloud

1 to 6 months
  • 10% migrated
  • Retire unused workload
  • Small apps
  • IaaS—lift and shift

(IaaS = Infrastructure as a Service)

7 to 18 months
  • 28% migrated
  • Reduce multiple environments
  • Small and mid-sized apps
  • IaaS and PaaS

(PaaS = Platform as a Service)

19 to 36 months
  • 74% migrated
  • Large, more complex apps
  • Focus on PaaS
37 to 48 months
  • 90%+ migrated
  • Largest, most complex apps
  • Design cloud-native apps

Our journey to transform our on-premises IT infrastructure to a system based in the Microsoft Azure cloud took roughly four years, and we continue to innovate and refine our approach today.

Be vision-led and metric-driven

When setting off on a years-long journey, you don’t just walk out the door with a vague idea of where you’re going. As we embarked on this years-long project, our leadership laid out the overall vision that guided our project plans.

“Our leadership was critical; they gave us the vision of, ‘We’re going to migrate to the cloud, and we want to be first and best. We’re going to be an example for the rest of the industry,’” Apple says. “They made a big bet on it, and then they put the support behind it to hold the teams accountable, tracking against the goals and metrics. This directive went all the way to up to (Microsoft CEO) Satya Nadella; it was an absolute priority from his point of view.”

Apple, Basem, and O’Flaherty are shown in a composite photo.
Pete Apple (left to right), Basma Basem, and Martin O’Flaherty are employees in Microsoft who played important roles in our transformative journey to a cloud-based IT infrastructure.

Martin O’Flaherty, a principal PM manager at Microsoft Digital, explained how important it was that senior leadership stuck to the vision and remained patient during the long journey to the cloud.

“Our executive vice president took the long view of this project, and he backed us as we took the time to work through all the issues and all the times when things failed,” O’Flaherty says. “We had to ‘embrace the red’ by talking about those failures rather than cover things up, in order to keep learning throughout the process. Leadership made it clear that doing the job right was the priority, and that trust gave us the confidence to stay focused and deliver.”

As far as metrics are concerned, consider the size of the Microsoft digital landscape: more than 220,000 employees in over 100 countries using more than 750,000 devices. Moving a supporting infrastructure of this size to the cloud required careful attention to specific metrics throughout the process, both to carefully measure progress and to understand the biggest challenges and potential obstacles along the way.

“We had something like 800-plus different services across the company that we had to deal with in our journey to the cloud, which I like to call the total footprint,” Apple says. “We had to track how many of them were in the cloud, how many were on-premises, and how many were hybrid. And we kept track of that quarter by quarter. We also had to monitor things like the spend for on-prem versus the cloud, and our quality metrics such as service-level agreements and customer satisfaction ratings. We had to keep an eye on all of it.”

Pay attention to people, processes, and technology

Moving a large IT infrastructure to the Azure cloud is a technology challenge, but it’s just as important to think about the people and the processes involved.

“It’s not just about getting everything moved from on-premises servers to a cloud solution,” Apple says. “Once you have it there, it’s about what your staff should look like, the different roles and skills you’ll need to run things in the cloud. Then, how do you plan for the day-to-day operation of it? What kind of processes and monitoring do you need?”

O’Flaherty notes that of these three considerations, transforming your people resources for the move to the cloud might be the biggest task.

“When we talk about ‘people change,’ we mean how people do their work—and frankly, that’s usually the hardest challenge,” O’Flaherty says. “Once we had good momentum in moving our technology to the cloud, we needed to change how people do their work. We needed to modernize.”

Apple says that transitioning the people skills of the organization was a deliberate process.

“We provided training, and we made it very clear that everyone needed to learn to work with infrastructure as code, rather than physical machines,” he says. “And whenever we had the ability to hire new people, we prioritized those DevOps skills. We invested in that, because that was the direction we were going.”

Sometimes, the technology decisions are also what enables the implementation of more effective processes. O’Flaherty explains how one specific decision during the cloud journey made it possible to implement best-practice processes that ensured quality standards were met.

“We decided to use one single instance of Azure DevOps. So, all of our teams—across more than 800 applications—and all our code repos were in one Azure DevOps account,” O’Flaherty says. “This setup allowed us to implement consistent engineering standards, like requiring every code change to be reviewed by two people. Because we could enforce these policies across the board, we achieved a new level of consistency, accountability, and confidence in our development process.”

Confront legacy applications and technical debt

When the time comes to make a major technological transformation, like moving an on-premises infrastructure to the cloud, it provides the perfect opportunity to deal with the challenge of aging legacy applications and technical debt that has accumulated within the organization.

Dealing with legacy applications up front means you can reduce the total load of what you end up moving to the cloud.

“The first thing we asked was, ‘What do we not need anymore?’” O’Flaherty says. “We were able to identify something like 30% of tools and services that could be retired or consolidated. We also looked at other SaaS solutions as replacements for things we were building ourselves, which removed about 15% more of the portfolio. So we had almost halved the total burden at that point.”

Strategic approach for moving our IT infrastructure to the cloud

Graphic shows the different segments of our network services in terms of how they are handled during the move from on-premises to the cloud.
One key benefit of moving our IT infrastructure to the Microsoft Azure cloud was that we were able to strategically reduce by nearly half the total amount of services we eventually moved to the cloud. This was achieved by eliminating legacy services, dealing with accumulated technical debt, and leveraging first- and third-party SaaS (Software as a Service) solutions instead of lifting and shifting them to the cloud.

Apple explained the benefits of starting with a clean slate when you move to the cloud.

“There’s always that backlog of work items and legacy things, and the idea is that you don’t want to bring your bad habits with you to the cloud,” Apple says. “So, if you’ve got a solution that is still using COBOL or Windows 2008, maybe it’s time to pull off the Band-Aid? That’s a good investment of your developer capacity.”

There were also the significant challenges that Microsoft faced with addressing years of technical debt—which O’Flaherty describes as technical issues resulting from past development decisions that weren’t as robust or maintainable as they could have been—during the early stages of the journey to the cloud.

“We knew the scale of the technical debt we had—it was kind of like an iceberg, with a huge amount of work below the surface. And we knew it was going to take several years to get through it all,” he says. “The key was understanding that we were going to have to invest a significant amount of engineering time to get there—that we needed to put 30% to 40% of our engineering resources behind this effort for well over a year just to get on top of the problem. We had to take that hit up front, or we’d still be in the same boat today.”

Transform your operations with end-to-end thinking

In the old world of on-premises network infrastructure, services were often siloed. Different departments ran their own systems and tools, and employees couldn’t always access data and technologies that were needed to gain a bigger picture or develop cross-disciplinary solutions.

Enter the cloud-based network, which opens up the ability for end-to-end thinking and working.

“In the old days, the interactions between applications were pretty monolithic,” Apple says. “With the move to the cloud and engineering modernization, you open up new kinds of compute and access to data. Developers can use APIs, containers, Power Apps and more to access the various data lakes we have across the company. There’s a lot more flexibility, and they can work much faster.”

Another area where having a cloud-based network allows us to take more of an end-to-end approach is security, which has become a major priority at Microsoft in recent years.

“End-to-end thinking means I can do a multi-layer defense and comprehensive security implementation in the cloud,” says Basma Basem, a senior program manager in Microsoft Security. “I can make sure that there’s a security implementation from an architecture and design standpoint on each layer of the services I’m building in the Azure cloud. And you have such a wide variety of security solutions in the cloud, it makes it much easier to find the right solution and ensure that you have good security posture management.”

Consistently prioritize your goals and metrics

When it comes to tackling such a tremendously huge project, it’s vital to understand your priorities and keep them front and center as you move through the process.

“We had a lot of priorities around financial considerations in moving away from the physical infrastructure model,” Apple says. “That was number one. Then we had priorities around efficiency and modernization. And we had to find ways to measure those priorities and ensure we were hitting our targets.”

Of course, prioritization also means that you can’t take on all your challenges at once. Your leads have to make sure that they communicate effectively so everyone understands the priorities, the pace of progress, and when different issues will be addressed.

“There’s a tendency to kind of try to boil the ocean and fix everything at once,” O’Flaherty says. “We really had to temper people’s expectations, even within our own leadership, and say that this is going to take a while. If there were 50 compliance problems, we couldn’t tackle all 50 at the same time—the leads would identify the top 3, and we’d do those 3, then move on to the next batch. We really had to set specific goals and follow our metrics along the way.”

And there’s one overall metric that Apple likes to keep top-of-mind when discussing what moving our network to the Azure cloud has meant for Microsoft—cost.

“We’re spending 20% less on our infrastructure costs than we did when we were operating on-premises,” Apple says. “When you look at what we were spending on physical infrastructure versus today, in the cloud, it’s a significant savings.”

Every cloud journey has its own path

Today, we operate roughly 98 percent of the Microsoft corporate infrastructure in the cloud, and we are continually looking for strategies to be more efficient, more automated, and less costly. Apple notes that the company decided to push hard to get to this level (“to the point of heartburn for some people”) and show what was possible, but that not every organization will need or want to go this far in their own cloud transition.

“We are the extreme in terms of pushing the bar,” Apple says. “We’ve been very innovative in this space, because we wanted to prove our point in terms of how much we could put on the cloud. We realize every business has to make tradeoffs, and some may want to keep a certain percentage of their infrastructure still on-premises. But the flexibility of the cloud and the cost savings are real, and we want our customers to understand that and take advantage of it.”

Key Takeaways

Here are some of the major insights we took from the process of moving our network into the Azure cloud:

  • Confront your technical debt. Be prepared to do the upfront work of addressing your technical backlog and getting into a better state before you make the transition to the cloud. You’ll not only avoid major headaches—you’ll also reduce the total network footprint that you’ll be moving.
  • Invest for the long term. Leadership has to be willing to devote significant resources over the course of the project, and to understand that the results might not be realized in the short term. But the overall payoff will be worth it once you’ve completed the work.
  • Get employees on board. Make training and upskilling a priority as you transition your workforce to a cloud-first mindset. Incorporate the shift into individual reviews and goal-setting so that everyone is pointed in the right direction.
  • Take the opportunity to instill a “secure by default” philosophy. As you move to the cloud, you can proactively create and deploy strong security architecture, keeping compliance requirements top of mind, continuously monitoring your organization’s security posture, and fostering a culture where everyone factors security risk into their work and decision making.
  • Embrace “the red.” Create a culture where teams are comfortable with revealing when they are falling short on their metrics (being “in the red”). Being open about those issues will help others avoid the same pitfalls in their own areas and significantly increase overall quality.
  • Keep your goals and metrics front and center. On a long and complicated journey, it’s vital to keep everyone focused on the destination—your goals, sometimes called objectives and key results (or OKRs). Defining and carefully tracking the right metrics (also known as key performance indicators, or KPIs) is another essential part of this process.

The post Five principles that guided our network journey to Microsoft Azure and the cloud at Microsoft appeared first on Inside Track Blog.

]]>
19387