Transforming Microsoft’s enterprise network with next-generation connectivity

|

A Microsoft employee works at her desk in an open work area in a Microsoft building.
Identity-focused security models, software-defined infrastructure, and network-as-code are enabling Microsoft to provide next-gen connectivity to its employees.

Microsoft Digital technical storiesNext-generation connectivity is enabling us to transform our internal enterprise network here at Microsoft.

Deploying a more agile, secure, and effective network environment across Microsoft is empowering our employees to thrive in our new hybrid world. This article describes how we’re implementing this new network strategy, including goals, action areas, expected results, and a brief evaluation of anticipated future state and immediate next steps.

And this transformation is coming at a good time.

The need for digital transformation is evident across all industries, and our now largely mobile and remote workforce has created challenges and new requirements for our network environment.

Fortunately, employee productivity and satisfaction with connectivity remain high, despite these challenges as remote work is the new normal. During the pandemic, for example, we—our Microsoft Digital (MSD) team—tallied up to 190,000 unique remote connections daily.

Our next-generation connectivity strategy must account for both traditional in-building experience and hybrid experiences, and it must accommodate critical industry factors driving the usage of our network resources, including:

  • Advanced speed and scale of threats caused by cyberattacks.
  • The adoption of the cloud as our primary endpoint for business applications, security monitoring, and employee productivity—supplanting traditional on-premises infrastructure and network patterns.
  • The continuing impact of remote work and emerging hybrid scenarios.

Workplace modernization efforts are creating a surge of digital Internet of Things (IoT) devices and sensors on the network in Microsoft buildings and campuses.

With these factors in mind, we’re making changes to adapt to new traffic patterns, use cases, security requirements, and other demands of our network infrastructure. Legacy approaches to network operations won’t provide adequate service and security.

We’re embracing cloud WAN and edge security models associated with user identities, device states, and applications that aren’t directly dependent on the physical network infrastructure. We’re efficiently scaling network deployment and operations with investments in software-defined infrastructure, network automation, data-driven insights and AIOps.

Software-defined infrastructure has brought network and application security to the places where end users can consume them when needed, regardless of where they’re physically located. Automation and AIOps has started to eliminate some manual operations and eventually could encompass every part of the engineering and operational life cycle, including providing the capability to respond quickly to changing business needs and live site incidents.

[Discover the lessons we’ve learned in engineering Zero Trust networking. | Unpack our Zero Trust networking lessons for leaders. | Explore how we’re implementing a Zero Trust security model at Microsoft.]

Focus points for next-generation connectivity

Transforming our network to support next-generation connectivity requires us to reimagine many of our traditional connectivity models. We’re focusing on several different areas to help us reduce our legacy network dependencies and move to a more modern connectivity design:

  • Shift to the internet as the primary network transport. We’re using the internet as the primary network for most of our end-user and remote-office traffic. 90+ percent of corporate data infrastructure that the average knowledge worker at Microsoft uses resides in the cloud. Using the internet creates simpler, more fluid access to cloud-based infrastructure and services. Microsoft Entra Private Access solutions use the internet to bring all traffic into our software-defined cloud edge for all end-user connectivity, removing dependencies on legacy VPN solutions. Microsoft Azure Virtual Desktop, Windows 365 Cloud PC, and other virtualization solutions allow users to connect in a secure way to corporate resources without using VPN.
  • Reduce our corporate network footprint and dependencies. We’re transforming our enterprise perimeter from fixed resources and limited locations to a grouping of Zero Trust segments and services such as SASE and Microsoft Azure Firewall that provide dynamic edge capabilities widely available at all our cloud points of presence. Zero Trust moves beyond the notion of the physical network as a security perimeter and replaces it with identity and device-health evaluation in a set of services delivered from the cloud. Modern access controls will be able to apply policies independent of network location and connectivity method. Our users must be able to securely connect to corporate applications and services from any network, unless explicitly prohibited. Controls and monitoring must be oriented toward the resources that we want to protect, not only the infrastructure on which the resources reside.
  • Reduce MPLS WAN infrastructure and dependencies. Next-generation connectivity shifts traffic away from traditional Multiprotocol Label Switching Wide Area Network (MPLS WAN) aggregation hubs into a more distributed, internet-based transport model. Our firewall, network edge, and VPN aggregation points will no longer require MPLS backbone connectivity. The internet is the network of choice, and our users access their applications at the endpoint by using secure delivery and modern cloud edge logic. Moving our engineering build systems into the cloud will greatly reduce the need for large-capacity private backbone connectivity.
  • Align with emerging security standards. Increasing security threat profiles that attempt to access our critical systems and supply chain further reinforced our Zero Trust networking strategy. Introducing stronger macro and micro network segmentation that protects development and test pipelines will further advance the segmentation concepts introduced in Zero Trust.
  • Decrease wired user access infrastructure for end-user devices. Shifting end-user devices such as laptops from wired infrastructure to wireless helps simplify connectivity and eliminate technical debt and sustained funding. Wireless networking is the default connection method across our campuses for user-focused devices. In many cases, our wireless network supersedes our wired network infrastructure and makes it redundant. We’re investing in wireless technology for new deployments to reduce costs and foster an identity-driven and device-driven security approach across our network, while we repurpose our wired infrastructure for devices that require Power over Ethernet (PoE), such as building management systems, IoT devices, and digital signage.
  • Adopt modern network management practices. As our infrastructure continues to increase in complexity and size, reliance on traditional network management practices adds risk to service quality, resource efficiency, and our employee experience. We’re consolidating the scope of our management systems with an Azure-focused approach, using scalable cloud network manageability solutions where feasible. We’re also capitalizing on the vast amount of telemetry we collect to create self-healing networks that automatically and intelligently predict and address customer needs and potential platform failures using AIOps. As our network becomes increasingly software-defined, increased rigor around DevOps practices will improve the quality of check-ins, create safe deployments, and detect customer experience degradation.

Key design and architecture principles

We’re informing our strategy with several architecture and design principles for the network that will help us address our focus points while we develop the next-generation connectivity model. The key principles are outlined in the following sections.

Enforce, enhance, and expand Zero Trust

A Zero Trust architecture model reduces the risk of lateral movement in any network environment. Identities are validated and secured with multifactor authentication (MFA) everywhere. Using MFA eliminates password expirations and, eventually, passwords. The added use of biometrics ensures strong authentication for user-backed identities.

Enforceable device-health controls protect the network and require applications and services to observe and enforce our device-health policies. The health of all services and applications must be monitored to ensure proper operation and compliance and enable rapid response when those conditions aren’t met.

Least privilege access principles and network segmentation ensure that users and devices have access only to the resources required to perform their role. Both Microsoft Azure Entra ID authentication and conditional access will play a crucial role in establishing a Zero Trust model. The network will also supply mechanisms like wired port security and an 802.1x solution that allows users to register their devices for elevated access.

Adopt an internet-first, wireless-first approach for connectivity

The internet will be the default connectivity method for laptops, mobile devices, IoT applications, and most end-user processes. We’ll migrate remote offices to use the internet as their primary transport for end-user connectivity, replacing most of our MPLS hub and backbone services.

Some core workloads such as supply chain, high-risk engineering, and specific development workloads will still require private connectivity in the future. However, most users and applications will connect to the cloud via the internet. In these cases, the logical overlay can provide private connectivity over a physical underlay like the internet.

Employee workflows are moving toward flexible use of space both inside and outside the building, and that’s best facilitated through wireless connectivity.

Implement distributed, strong network services segmentation

As we migrate remote-office connectivity to internet-first, distributed services and strong network segmentation will be crucial. We’ll replace localized services such as DNS and firewall with distributed, cloud-based services, or deliver those services from a dedicated shared-services network segment.

The shared services segment will enable users to consume services by using firewall-protected data-plane access. This segment will only allow access to those ports and protocols needed for the end-devices to deliver their required functionality.

This segment will host typical network services like DNS and DHCP and management platforms but also security solutions such as identity platforms for authentication.

Optimize network connectivity for integrated voice/data services

There has been a recent acceleration in the digital transformation of our user experience and customer service interactions. Microsoft support services are accessible to customers through our Omni-Channel User Experience which includes not only channels such as voice, chat, email, social but also video interactions.

Real-time interactions will be served by peer-to-peer routing and the ability for our voice services to provide the most direct route possible. Network connectivity for voice services will, as available, include class of service (CoS), bandwidth allocation/management, route optimization and network security.

Internet first will be our preferred path for call routing with an exception for global locations where a private connection for voice is deemed necessary to guarantee quality of service (QoS) for our user experience.

Build software-defined, intelligent infrastructure

Software-defined infrastructure and network-as-code initiatives will create a more stable and agile network environment. We’re automating our network-provisioning processes to require minimal—and in some cases, zero-touch network provisioning. Network Intent will maintain network and device configuration after provisioning.

End users will be able to request their own preapproved network connectivity using just-in-time access with self-service capabilities through exposed APIs and user interfaces. We’re decoupling the network logical overlay from the physical underlay to allow users to connect to applications and allow devices to talk to each other on dedicated segments across our network while still adhering to security policies. Software-defined infrastructure will supply the flexibility needed to easily interchange connectivity modes and maintain the logical overlay without significant dependency on the network physical underlay.

Intelligent monitoring, diagnosis & self-healing capabilities

Data analytics applied to both real-time and historical telemetry help engineers to better understand the current state of our network infrastructure and anticipate future needs. Centralizing and analyzing the vast amount of telemetry we collect across the network helps us more efficiently detect, diagnose, and mitigate incidents to reduce their effects on customers.

Automated incident correlation will reduce pager-storms and incident noise for on-call engineers. Auto-correlation learns from historical incident co-occurrence patterns by using a combination of AI and machine learning techniques to correlate incoming alerts with active incidents.

Incident root cause analysis assists on-call engineers with diagnosing incidents in near real-time. Failure correlation helps engineering teams identify and fix service failures by correlating failure events with telemetry. In the future, we’ll move towards self-healing by using data and analytics to perform predictive problem management and proactive mitigation.

Secure the network infrastructure

The security controls ensuring the confidentiality, integrity, and availability of the underlying network infrastructure and the control and management stems for the network hardware will continue to evolve and protect from advanced persistent threats, vulnerabilities, and exploitation attempts. Zero-day vulnerabilities and supply-chain risks accelerate strategies to ensure that the foundation of network services remains resilient.

Administrator access to infrastructure and management tools requires the use of hardened workstations, certificate-only identities, and a multiple-approver access-request model to ensure that least privilege access is enforced. We’ll continue to segment network infrastructure and management tools to limit network attack surface.

Network infrastructure control design will continue to focus on creating admin resources that are unreachable and undiscoverable from the internet, other network infrastructure elements, or any other segment except for approved management tools.

Examining network future state

While next-generation connectivity involves our entire network infrastructure, our implementation will significantly affect several aspects of the network environment. We’ve established guidelines and anticipated future states for these areas to better define the ongoing implications for our network. These aren’t mandates, but rather suggestions for where we can focus in the future to optimize our investments while we keep our long-term goals in mind.

Backbone

The layout of our backbone will stay in parallel configuration with the Azure backbone as our services and applications increasingly migrate into Microsoft Azure. However, we expect our private backbone needs to decrease as we continue to adopt the Azure backbone and move our network edge to the cloud. Our end-user segment has limited impact on the required capacity for our backbone.

As we continue to isolate and migrate our engineering workloads into the Azure cloud following our zero-trust principles the required backbone capacity will be further reduced. In development-intensive regions this determines the vast majority of capacity.

MPLS WAN

Decoupling the underlay from overlay technologies for MPLS WAN will allow us to deploy more agile and dynamic physical connectivity. This will remove the requirement for manually built static point-to-point connections. We’ll use a software-defined overlay layer while permitting the flexibility of a high-bandwidth internet connection as the physical underlay.

Replacing existing MPLS connectivity isn’t the primary objective. We want to provide our users with the connectivity that they need in the most efficient way, with the best and most secure user experience. In situations where it makes sense or where connectivity is up for renewal, we’ll examine the optimal underlay, which might be an Azure based VWAN based connection, and no longer a point-to-point or MPLS connection.

We’ll continue to evaluate more agile connectivity methods for our offices that best serve our objective of a cloud-first, mobile-first model driven by a programmable, intelligent infrastructure.

End-user segments

As we migrate to the cloud, the typical traffic flow is from the end user to the public internet and Microsoft Azure. Using the internet for our end-user segments enables our users to access the Azure cloud using the shortest network path possible and use the capacity and global connectivity of Azure’s network to find the most optimal path to their destination or application. DIA connections or shared internet connections supply the optimal solution for many offices.

Developer segments

Developer and test environments require isolation from unrelated workflows to ensure security and high confidence in product integrity. Additionally, these environments have diverse ranges of client capabilities. Some are fully managed for ease of administration, while others run unmanaged devices to benefit performance testing and simplification of debugging products under test.

By segmenting these environments away from broader shared networks, development teams can build and test their respective products in isolated environments. The goal is to allow communication with the bare minimum number of endpoints to validate code end to end, have the flexibility to host various state clients within, and reduce exposures to and from other outside environments.

Private segments

We use several private segments that are physically isolated from each other but still require logical connectivity between these physical locations. Examples of these include:

  • Infrastructure Admin Network. Admin interfaces for our network devices that are hosted with very strict access regulations.
  • Facilities Modern. Modern IoT devices that use internet-only connectivity methods, such as Microsoft Azure IoT Hub.
  • Facilities Legacy. IoT devices that still require connectivity to the corporate network for proper operation, such as legacy phones, video-conferencing equipment, or printers.

To connect segments, we’ll need to rely on an overlay solution like Microsoft Azure Virtual WAN (VWAN). When we replace WAN links with internet-only underlay connectivity, the any-to-any connectivity model isn’t available via the underlay. If a device in office A needs to communicate with a device in office B hosted on the same segment, VWAN will provide the software-based overlay solution to make this possible.

Approaching the internet edge

Most of our network segments will use an internet edge. This shift alone won’t affect overall load on edge processing. However, we do expect a fundamental change in load from our traditional edges to a more distributed model. We’ll also have more security delivered programmatically from the cloud in tandem with our intelligent infrastructure initiative. We will reevaluate the need for large, centralized edge stamps. Edge use and configuration will be influenced by several factors, including:

  • End-user devices will use localized internet breakout.
  • More security controls are moving into the endpoint itself.
  • The design and rollout of a developer segment will have a major impact on our edge in certain locations, because it makes up the majority (90 percent) of the required capacity in certain locations.
  • Security handles will move into the cloud if cloud-based SASE solutions become the new norm for our company.

Transforming VPN

Our current VPN solution is a hardware-based investment. Although it has proven stability and delivered great value to the company during the COVID-19 pandemic period, classic on-or-off VPN behavior that provides access to all or none of the private applications doesn’t adequately support the Zero Trust model.

Replacing our current VPN environment isn’t a standalone goal. We need to address uncontrolled connectivity in an open corporate network environment that allows all devices to communicate with one another.

The future of our remote access is a blend of direct access to Microsoft Azure-based applications and limited scope access to legacy on-premises resources. Strong identity-based security and monitored traffic flow is a solution requirement. We’ll facilitate this access through multiple technology solutions instead of a single service.

SASE solutions will allow conditional access to applications via the internet and will be based on Microsoft Entra ID authentication. This allows for effective micro-segmentation in those situations where it’s necessary. In parallel, there will be use cases for Microsoft Azure Virtual Desktop and Windows 365 Cloud PC to provide selective access to applications, for example, as an alternative to an extranet connection. A SASE agent can also be installed on these devices to offer a cloud-based PC to vendors (or FTEs), providing virtual hardware in a software-defined environment.

Any reduction in VPN will be contingent on a massive effort to shift and lift hundreds of applications that are dependent on the corporate network. These applications need to become internet facing, whether through a proxy solution or through an intentional migration to the cloud. As these workloads shift, we’ll be able to reduce daily VPN usage and eventually eliminate it entirely for most of the infoworker persona.

In-building connectivity

With the continued shift to wireless, wired ethernet connectivity in buildings must diverge into two distinctly different use cases and fulfillment methods: concentrated wired centers for remaining high bandwidth workflows, and wired ethernet catering to IoT devices where power and data convergence needs are emerging with Power over Ethernet (PoE).

Pervasive wired Ethernet throughout our facilities has over the years created an unintended consequence—unique developer roles and workloads that have inhibited consolidation of labs, content hosts, and critical development components into centralized real estate spaces.

This creates a cycle of over-provisioning wired networking infrastructure in each new building constructed—wanting to have the capacity there if it’s ever needed— and has led to today’s current state of drastically underutilized switching infrastructure.

The average user-access switch utilization is currently between 10 percent and 20 percent in many regions, and closer to 30 percent or 40 percent in development-heavy regions like the Puget Sound and Asia.

Network infrastructure security

As devices and services have become more internet facing, gaps in capability or new threats have emerged that require additional security controls to augment the traditional controls that remain.

Cloud-based services often require application firewalls or private access controls to limit attack surfaces, enforce strong authentication, and detect inappropriate or malicious use. Devices often require solutions that are effective if they move between network medias (wired, wireless, cellular), to different network segments, or between corporate and external networks.

The threat landscape is constantly changing, and the way devices or services depend on the network has also become fundamentally different with cloud, mobility, and new types of networked devices.  Because of this, the mechanisms to provide effective controls must be adapted and continue to evolve.

Key Takeaways

As our strategy unfolds, we’ll be making priority decisions on what area in our vision and strategy we’re ready to invest in over time. To that end, we’re currently addressing several questions that directly influence how we move forward with next-generation connectivity:

  • What is the preferred edge solution for our knowledge workers?
  • How do we design the developer and engineering segment?
  • What are the security requirements to protect the unmanaged devices that connect to our network every day?

We’re continuing to make changes to adapt to new traffic patterns, use cases, security requirements, and other use-driven demands of our network infrastructure. We’ll continue to transform our network, embracing new WAN and edge security models associated with user identities, devices, and applications. Software-defined infrastructure and processes will bring the network and application security to the places where end users can consume them when needed, regardless of where they’re physically located, thus providing a more agile, more secure, and more effective network environment throughout Microsoft.

Related links

Recent