Azure Networking Archives - Inside Track Blog

Using a Zero Trust strategy to secure Microsoft’s network during remote work

Aleenah Ansari — Wed, 03 Apr 2024 13:59:49 +0000

[Editor’s note: This content was written to highlight a particular event or moment in time. Although that moment has passed, we’re republishing it here so you can see what our thinking and experience was like at the time.]

Microsoft’s cloud-first strategy enables most Microsoft employees to directly access applications and services via the internet, but remote workers still use the company’s virtual private network (VPN) to access some corporate resources and applications when they’re outside of the office.

This became increasingly apparent when Microsoft prepared for its employees to work remotely in response to the global pandemic. VPN usage increased by 70 percent, which coincides with the significant spike in users working from home daily.

So then, how is Microsoft ensuring that its employees can securely access the applications they need?

With split tunneling and a Zero Trust security strategy.

As part of the company’s Zero Trust security strategy, employees in Microsoft Digital Employee Experience (MDEE) redesigned the VPN infrastructure by adopting a split-tunneled configuration that further enables the company’s workloads moving to the cloud.

“Adopting split tunneling has ensured that Microsoft employees can access core applications over the internet using Microsoft Azure and Microsoft Office 365,” says Steve Means, a principal cloud network engineering manager in MDEE. “This takes pressure off the VPN and gives employees more bandwidth to do their job securely.”

Eighty percent of remote working traffic flows to cloud endpoints where split tunneling is enabled, but the rest of the work that employees do remotely—which needs to be locked down on the corporate network—still goes through the company’s VPN.

“We need to make sure our VPN infrastructure has the same level of corporate network security as applications in the cloud,” says Carmichael Patton, a principal security architect on Microsoft’s Digital Security and Resilience team. “We’re applying the same Zero Trust principles to our VPN traffic, by applying conditional access to each connection.”

[Learn how Microsoft rebuilt its VPN infrastructure. Learn how Microsoft transitioned to modern access architecture with Zero Trust. Read how Microsoft is approaching Zero Trust Networking.]
For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=bleFoL0NkVM, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Experts from Microsoft Digital answer frequently asked questions around how VPN, modern device management, and Zero Trust come together to deliver a world class remote work platform.

Securing remote workers with device management and conditional access

Moving most of the work that employees require to the cloud only became possible after the company adopted modern security controls that focus on securing devices.

“We no longer rely solely on the network to manage firewalls,” Patton says. “Instead, each application that an employee uses enforces its own security management—this means employees can only use an app after it verifies the health of their device.”

To support this transformed approach to security, Microsoft adopted a Zero Trust security model, which manages risk and secures working remotely by managing the device an employee uses.

“Before an employee can access an application, they must enroll their device, have relevant security policies, and have their device health validated,” Patton says. “This ensures that only registered devices that comply with company security policies can access corporate resources, which reduces the risk of malware and intruders.”

The team also recommends using a dynamic and scalable authentication mechanism, like Azure Active Directory, to avoid the trouble of certificates.

While most employees rely on our standard VPN infrastructure, Microsoft has specific scenarios that call for additional security when accessing company infrastructure or sensitive data. This is the case for MDEE employees in owner and contributor roles that are configured on a Microsoft Azure subscription as well as employees who make changes to customer-facing production services and systems like firewalls and network gear. To access corporate resources, these employees use Privileged Access Workstations, a dedicated operating system for sensitive tasks, to access a highly secure VPN infrastructure.

Phil Suver, a principal PM manager in MDEE, says working remotely during the global pandemic gives employees a sense of what the Zero Trust experience will be like when they return to the office.

“Hardened local area networks that previously accessed internal applications are a model of the past,” Suver says. “We see split tunneling as a gateway to prepare our workforce for our Zero Trust Networking posture, where user devices are highly protected from vulnerability and employees use the internet for their predominant workload.”

It’s also important to review your VPN structure for updates.

“When evaluating your VPN configuration, identify the highest compliance risks to your organization and make them the priority for controls, policies, and procedures,” Patton says. “Understand the security controls you give up by not flowing the connections through your internal infrastructure. Then, look at the controls you’re able to extend to the clients themselves, and find the right balance of risk and productivity that fits your organization.”

Keeping your devices up-to-date with split tunneling

Enterprises can also optimize patching and manage update compliance using services like Microsoft Endpoint Manager, Microsoft Intune, and Windows Update for Business. At Microsoft, a split-tunneled VPN configuration allows these services to keep devices current without requiring a VPN tunnel to do it.

“With a split-tunneled configuration, update traffic comes through the internet,” says Mike Carlson, a principal service engineering manager in MDEE. “This improves the user experience for employees by freeing up VPN bandwidth during patch and release cycles.”

At Microsoft, device updates fall into two categories: feature updates and quality updates. Feature updates occur every six months and encompass new operating system features, functionality, and major bug fixes. In contrast, monthly quality updates include security and reliability updates as well as small bug fixes. To balance both user experience and security, Microsoft’s current configuration of Windows Update for Business prompts Microsoft employees to update within 48 hours for quality updates and 7 days for feature updates.

“Not only can Windows Update for Business isolate update traffic from the VPN connection, but it can also provide better compliance management by using the deadline feature to adjust the timing of quality and feature updates,” Carlson says. “We can quickly drive compliance and have more time to focus on employees that may need additional support.”

Evaluating your VPN configuration

When your enterprise evaluates which VPN configuration works best for your company and users, you must evaluate their workflows.

“Some companies may need a full tunnel configuration, and others might want something cloud-based,” Means says. “If you’re a Microsoft customer, you can work with your sales team to request a customer engagement with a Microsoft expert to better understand our implementation and whether it would work for your enterprise.”

Means also said that it’s important to assess the legal requirements of the countries you operate in, which is done at Microsoft using Azure Traffic Manager. For example, split tunneling may not be the right configuration for countries with tighter controls over how traffic flows within and beyond their borders.

Suver also emphasized the importance of understanding the persona of your workforce, suggesting you should assess the workloads they may need to use remotely and their bandwidth capacity. You should also consider the maximum number of concurrent connections your VPN infrastructure supports and think through potential seasonal disruptions.

“Ensure that you’ve built for a snow day or a pandemic of a global nature,” Suver says. “We’ve had to send thousands of customer support agents to work from home. Typically, they didn’t use VPN to have voice conversations with customers. Because we sized and distributed our infrastructure for a global workforce, we were able to quickly adapt to the dramatic shift in workloads that have come from our employees working from home during the pandemic. Anticipate some of the changes in workflow that might occur, and test for those conditions.”

It’s also important to collect user connection and traffic data in a central location for your VPN infrastructure, to use modern visualization services like Microsoft Power BI to identify hot spots before they happen, and to plan for growth.

Means’s biggest piece of advice?

Focus on what your enterprise needs and go from there.

“Identify what you want to access and what you want to protect,” he says. “Then build to that model.”

Tips for retooling VPN at your company

Azure offers a native, highly-scalable VPN gateway, and the most common third-party VPN and Software-Defined Wide Area Network virtual appliances in the Azure Marketplace.

For more information on these and other Azure and Office network optimizing practices, please see:

Here are additional resources to learn more about how Microsoft applies networking best practices and supports a Zero Trust security strategy:

The post Using a Zero Trust strategy to secure Microsoft’s network during remote work appeared first on Inside Track Blog.

Transforming Microsoft’s enterprise network with next-generation connectivity

Inside Track staff — Mon, 25 Mar 2024 16:00:37 +0000

Next-generation connectivity is enabling us to transform our internal enterprise network here at Microsoft.

Deploying a more agile, secure, and effective network environment across Microsoft is empowering our employees to thrive in our new hybrid world. This article describes how we’re implementing this new network strategy, including goals, action areas, expected results, and a brief evaluation of anticipated future state and immediate next steps.

And this transformation is coming at a good time.

The need for digital transformation is evident across all industries, and our now largely mobile and remote workforce has created challenges and new requirements for our network environment.

Fortunately, employee productivity and satisfaction with connectivity remain high, despite these challenges as remote work is the new normal. During the pandemic, for example, we—our Microsoft Digital (MSD) team—tallied up to 190,000 unique remote connections daily.

Our next-generation connectivity strategy must account for both traditional in-building experience and hybrid experiences, and it must accommodate critical industry factors driving the usage of our network resources, including:

Advanced speed and scale of threats caused by cyberattacks.
The adoption of the cloud as our primary endpoint for business applications, security monitoring, and employee productivity—supplanting traditional on-premises infrastructure and network patterns.
The continuing impact of remote work and emerging hybrid scenarios.

Workplace modernization efforts are creating a surge of digital Internet of Things (IoT) devices and sensors on the network in Microsoft buildings and campuses.

With these factors in mind, we’re making changes to adapt to new traffic patterns, use cases, security requirements, and other demands of our network infrastructure. Legacy approaches to network operations won’t provide adequate service and security.

We’re embracing cloud WAN and edge security models associated with user identities, device states, and applications that aren’t directly dependent on the physical network infrastructure. We’re efficiently scaling network deployment and operations with investments in software-defined infrastructure, network automation, data-driven insights and AIOps.

Software-defined infrastructure has brought network and application security to the places where end users can consume them when needed, regardless of where they’re physically located. Automation and AIOps has started to eliminate some manual operations and eventually could encompass every part of the engineering and operational life cycle, including providing the capability to respond quickly to changing business needs and live site incidents.

[Discover the lessons we’ve learned in engineering Zero Trust networking. | Unpack our Zero Trust networking lessons for leaders. | Explore how we’re implementing a Zero Trust security model at Microsoft.]

Focus points for next-generation connectivity

Transforming our network to support next-generation connectivity requires us to reimagine many of our traditional connectivity models. We’re focusing on several different areas to help us reduce our legacy network dependencies and move to a more modern connectivity design:

Shift to the internet as the primary network transport. We’re using the internet as the primary network for most of our end-user and remote-office traffic. 90+ percent of corporate data infrastructure that the average knowledge worker at Microsoft uses resides in the cloud. Using the internet creates simpler, more fluid access to cloud-based infrastructure and services. Microsoft Entra Private Access solutions use the internet to bring all traffic into our software-defined cloud edge for all end-user connectivity, removing dependencies on legacy VPN solutions. Microsoft Azure Virtual Desktop, Windows 365 Cloud PC, and other virtualization solutions allow users to connect in a secure way to corporate resources without using VPN.
Reduce our corporate network footprint and dependencies. We’re transforming our enterprise perimeter from fixed resources and limited locations to a grouping of Zero Trust segments and services such as SASE and Microsoft Azure Firewall that provide dynamic edge capabilities widely available at all our cloud points of presence. Zero Trust moves beyond the notion of the physical network as a security perimeter and replaces it with identity and device-health evaluation in a set of services delivered from the cloud. Modern access controls will be able to apply policies independent of network location and connectivity method. Our users must be able to securely connect to corporate applications and services from any network, unless explicitly prohibited. Controls and monitoring must be oriented toward the resources that we want to protect, not only the infrastructure on which the resources reside.
Reduce MPLS WAN infrastructure and dependencies. Next-generation connectivity shifts traffic away from traditional Multiprotocol Label Switching Wide Area Network (MPLS WAN) aggregation hubs into a more distributed, internet-based transport model. Our firewall, network edge, and VPN aggregation points will no longer require MPLS backbone connectivity. The internet is the network of choice, and our users access their applications at the endpoint by using secure delivery and modern cloud edge logic. Moving our engineering build systems into the cloud will greatly reduce the need for large-capacity private backbone connectivity.
Align with emerging security standards. Increasing security threat profiles that attempt to access our critical systems and supply chain further reinforced our Zero Trust networking strategy. Introducing stronger macro and micro network segmentation that protects development and test pipelines will further advance the segmentation concepts introduced in Zero Trust.
Decrease wired user access infrastructure for end-user devices. Shifting end-user devices such as laptops from wired infrastructure to wireless helps simplify connectivity and eliminate technical debt and sustained funding. Wireless networking is the default connection method across our campuses for user-focused devices. In many cases, our wireless network supersedes our wired network infrastructure and makes it redundant. We’re investing in wireless technology for new deployments to reduce costs and foster an identity-driven and device-driven security approach across our network, while we repurpose our wired infrastructure for devices that require Power over Ethernet (PoE), such as building management systems, IoT devices, and digital signage.
Adopt modern network management practices. As our infrastructure continues to increase in complexity and size, reliance on traditional network management practices adds risk to service quality, resource efficiency, and our employee experience. We’re consolidating the scope of our management systems with an Azure-focused approach, using scalable cloud network manageability solutions where feasible. We’re also capitalizing on the vast amount of telemetry we collect to create self-healing networks that automatically and intelligently predict and address customer needs and potential platform failures using AIOps. As our network becomes increasingly software-defined, increased rigor around DevOps practices will improve the quality of check-ins, create safe deployments, and detect customer experience degradation.

Key design and architecture principles

We’re informing our strategy with several architecture and design principles for the network that will help us address our focus points while we develop the next-generation connectivity model. The key principles are outlined in the following sections.

Enforce, enhance, and expand Zero Trust

A Zero Trust architecture model reduces the risk of lateral movement in any network environment. Identities are validated and secured with multifactor authentication (MFA) everywhere. Using MFA eliminates password expirations and, eventually, passwords. The added use of biometrics ensures strong authentication for user-backed identities.

Enforceable device-health controls protect the network and require applications and services to observe and enforce our device-health policies. The health of all services and applications must be monitored to ensure proper operation and compliance and enable rapid response when those conditions aren’t met.

Least privilege access principles and network segmentation ensure that users and devices have access only to the resources required to perform their role. Both Microsoft Azure Entra ID authentication and conditional access will play a crucial role in establishing a Zero Trust model. The network will also supply mechanisms like wired port security and an 802.1x solution that allows users to register their devices for elevated access.

Adopt an internet-first, wireless-first approach for connectivity

The internet will be the default connectivity method for laptops, mobile devices, IoT applications, and most end-user processes. We’ll migrate remote offices to use the internet as their primary transport for end-user connectivity, replacing most of our MPLS hub and backbone services.

Some core workloads such as supply chain, high-risk engineering, and specific development workloads will still require private connectivity in the future. However, most users and applications will connect to the cloud via the internet. In these cases, the logical overlay can provide private connectivity over a physical underlay like the internet.

Employee workflows are moving toward flexible use of space both inside and outside the building, and that’s best facilitated through wireless connectivity.

Implement distributed, strong network services segmentation

As we migrate remote-office connectivity to internet-first, distributed services and strong network segmentation will be crucial. We’ll replace localized services such as DNS and firewall with distributed, cloud-based services, or deliver those services from a dedicated shared-services network segment.

The shared services segment will enable users to consume services by using firewall-protected data-plane access. This segment will only allow access to those ports and protocols needed for the end-devices to deliver their required functionality.

This segment will host typical network services like DNS and DHCP and management platforms but also security solutions such as identity platforms for authentication.

Optimize network connectivity for integrated voice/data services

There has been a recent acceleration in the digital transformation of our user experience and customer service interactions. Microsoft support services are accessible to customers through our Omni-Channel User Experience which includes not only channels such as voice, chat, email, social but also video interactions.

Real-time interactions will be served by peer-to-peer routing and the ability for our voice services to provide the most direct route possible. Network connectivity for voice services will, as available, include class of service (CoS), bandwidth allocation/management, route optimization and network security.

Internet first will be our preferred path for call routing with an exception for global locations where a private connection for voice is deemed necessary to guarantee quality of service (QoS) for our user experience.

Build software-defined, intelligent infrastructure

Software-defined infrastructure and network-as-code initiatives will create a more stable and agile network environment. We’re automating our network-provisioning processes to require minimal—and in some cases, zero-touch network provisioning. Network Intent will maintain network and device configuration after provisioning.

End users will be able to request their own preapproved network connectivity using just-in-time access with self-service capabilities through exposed APIs and user interfaces. We’re decoupling the network logical overlay from the physical underlay to allow users to connect to applications and allow devices to talk to each other on dedicated segments across our network while still adhering to security policies. Software-defined infrastructure will supply the flexibility needed to easily interchange connectivity modes and maintain the logical overlay without significant dependency on the network physical underlay.

Intelligent monitoring, diagnosis & self-healing capabilities

Data analytics applied to both real-time and historical telemetry help engineers to better understand the current state of our network infrastructure and anticipate future needs. Centralizing and analyzing the vast amount of telemetry we collect across the network helps us more efficiently detect, diagnose, and mitigate incidents to reduce their effects on customers.

Automated incident correlation will reduce pager-storms and incident noise for on-call engineers. Auto-correlation learns from historical incident co-occurrence patterns by using a combination of AI and machine learning techniques to correlate incoming alerts with active incidents.

Incident root cause analysis assists on-call engineers with diagnosing incidents in near real-time. Failure correlation helps engineering teams identify and fix service failures by correlating failure events with telemetry. In the future, we’ll move towards self-healing by using data and analytics to perform predictive problem management and proactive mitigation.

Secure the network infrastructure

The security controls ensuring the confidentiality, integrity, and availability of the underlying network infrastructure and the control and management stems for the network hardware will continue to evolve and protect from advanced persistent threats, vulnerabilities, and exploitation attempts. Zero-day vulnerabilities and supply-chain risks accelerate strategies to ensure that the foundation of network services remains resilient.

Administrator access to infrastructure and management tools requires the use of hardened workstations, certificate-only identities, and a multiple-approver access-request model to ensure that least privilege access is enforced. We’ll continue to segment network infrastructure and management tools to limit network attack surface.

Network infrastructure control design will continue to focus on creating admin resources that are unreachable and undiscoverable from the internet, other network infrastructure elements, or any other segment except for approved management tools.

Examining network future state

While next-generation connectivity involves our entire network infrastructure, our implementation will significantly affect several aspects of the network environment. We’ve established guidelines and anticipated future states for these areas to better define the ongoing implications for our network. These aren’t mandates, but rather suggestions for where we can focus in the future to optimize our investments while we keep our long-term goals in mind.

Backbone

The layout of our backbone will stay in parallel configuration with the Azure backbone as our services and applications increasingly migrate into Microsoft Azure. However, we expect our private backbone needs to decrease as we continue to adopt the Azure backbone and move our network edge to the cloud. Our end-user segment has limited impact on the required capacity for our backbone.

As we continue to isolate and migrate our engineering workloads into the Azure cloud following our zero-trust principles the required backbone capacity will be further reduced. In development-intensive regions this determines the vast majority of capacity.

MPLS WAN

Decoupling the underlay from overlay technologies for MPLS WAN will allow us to deploy more agile and dynamic physical connectivity. This will remove the requirement for manually built static point-to-point connections. We’ll use a software-defined overlay layer while permitting the flexibility of a high-bandwidth internet connection as the physical underlay.

Replacing existing MPLS connectivity isn’t the primary objective. We want to provide our users with the connectivity that they need in the most efficient way, with the best and most secure user experience. In situations where it makes sense or where connectivity is up for renewal, we’ll examine the optimal underlay, which might be an Azure based VWAN based connection, and no longer a point-to-point or MPLS connection.

We’ll continue to evaluate more agile connectivity methods for our offices that best serve our objective of a cloud-first, mobile-first model driven by a programmable, intelligent infrastructure.

End-user segments

As we migrate to the cloud, the typical traffic flow is from the end user to the public internet and Microsoft Azure. Using the internet for our end-user segments enables our users to access the Azure cloud using the shortest network path possible and use the capacity and global connectivity of Azure’s network to find the most optimal path to their destination or application. DIA connections or shared internet connections supply the optimal solution for many offices.

Developer segments

Developer and test environments require isolation from unrelated workflows to ensure security and high confidence in product integrity. Additionally, these environments have diverse ranges of client capabilities. Some are fully managed for ease of administration, while others run unmanaged devices to benefit performance testing and simplification of debugging products under test.

By segmenting these environments away from broader shared networks, development teams can build and test their respective products in isolated environments. The goal is to allow communication with the bare minimum number of endpoints to validate code end to end, have the flexibility to host various state clients within, and reduce exposures to and from other outside environments.

Private segments

We use several private segments that are physically isolated from each other but still require logical connectivity between these physical locations. Examples of these include:

Infrastructure Admin Network. Admin interfaces for our network devices that are hosted with very strict access regulations.
Facilities Modern. Modern IoT devices that use internet-only connectivity methods, such as Microsoft Azure IoT Hub.
Facilities Legacy. IoT devices that still require connectivity to the corporate network for proper operation, such as legacy phones, video-conferencing equipment, or printers.

To connect segments, we’ll need to rely on an overlay solution like Microsoft Azure Virtual WAN (VWAN). When we replace WAN links with internet-only underlay connectivity, the any-to-any connectivity model isn’t available via the underlay. If a device in office A needs to communicate with a device in office B hosted on the same segment, VWAN will provide the software-based overlay solution to make this possible.

Approaching the internet edge

Most of our network segments will use an internet edge. This shift alone won’t affect overall load on edge processing. However, we do expect a fundamental change in load from our traditional edges to a more distributed model. We’ll also have more security delivered programmatically from the cloud in tandem with our intelligent infrastructure initiative. We will reevaluate the need for large, centralized edge stamps. Edge use and configuration will be influenced by several factors, including:

End-user devices will use localized internet breakout.
More security controls are moving into the endpoint itself.
The design and rollout of a developer segment will have a major impact on our edge in certain locations, because it makes up the majority (90 percent) of the required capacity in certain locations.
Security handles will move into the cloud if cloud-based SASE solutions become the new norm for our company.

Transforming VPN

Our current VPN solution is a hardware-based investment. Although it has proven stability and delivered great value to the company during the COVID-19 pandemic period, classic on-or-off VPN behavior that provides access to all or none of the private applications doesn’t adequately support the Zero Trust model.

Replacing our current VPN environment isn’t a standalone goal. We need to address uncontrolled connectivity in an open corporate network environment that allows all devices to communicate with one another.

The future of our remote access is a blend of direct access to Microsoft Azure-based applications and limited scope access to legacy on-premises resources. Strong identity-based security and monitored traffic flow is a solution requirement. We’ll facilitate this access through multiple technology solutions instead of a single service.

SASE solutions will allow conditional access to applications via the internet and will be based on Microsoft Entra ID authentication. This allows for effective micro-segmentation in those situations where it’s necessary. In parallel, there will be use cases for Microsoft Azure Virtual Desktop and Windows 365 Cloud PC to provide selective access to applications, for example, as an alternative to an extranet connection. A SASE agent can also be installed on these devices to offer a cloud-based PC to vendors (or FTEs), providing virtual hardware in a software-defined environment.

Any reduction in VPN will be contingent on a massive effort to shift and lift hundreds of applications that are dependent on the corporate network. These applications need to become internet facing, whether through a proxy solution or through an intentional migration to the cloud. As these workloads shift, we’ll be able to reduce daily VPN usage and eventually eliminate it entirely for most of the infoworker persona.

In-building connectivity

With the continued shift to wireless, wired ethernet connectivity in buildings must diverge into two distinctly different use cases and fulfillment methods: concentrated wired centers for remaining high bandwidth workflows, and wired ethernet catering to IoT devices where power and data convergence needs are emerging with Power over Ethernet (PoE).

Pervasive wired Ethernet throughout our facilities has over the years created an unintended consequence—unique developer roles and workloads that have inhibited consolidation of labs, content hosts, and critical development components into centralized real estate spaces.

This creates a cycle of over-provisioning wired networking infrastructure in each new building constructed—wanting to have the capacity there if it’s ever needed— and has led to today’s current state of drastically underutilized switching infrastructure.

The average user-access switch utilization is currently between 10 percent and 20 percent in many regions, and closer to 30 percent or 40 percent in development-heavy regions like the Puget Sound and Asia.

Network infrastructure security

As devices and services have become more internet facing, gaps in capability or new threats have emerged that require additional security controls to augment the traditional controls that remain.

Cloud-based services often require application firewalls or private access controls to limit attack surfaces, enforce strong authentication, and detect inappropriate or malicious use. Devices often require solutions that are effective if they move between network medias (wired, wireless, cellular), to different network segments, or between corporate and external networks.

The threat landscape is constantly changing, and the way devices or services depend on the network has also become fundamentally different with cloud, mobility, and new types of networked devices. Because of this, the mechanisms to provide effective controls must be adapted and continue to evolve.

As our strategy unfolds, we’ll be making priority decisions on what area in our vision and strategy we’re ready to invest in over time. To that end, we’re currently addressing several questions that directly influence how we move forward with next-generation connectivity:

What is the preferred edge solution for our knowledge workers?
How do we design the developer and engineering segment?
What are the security requirements to protect the unmanaged devices that connect to our network every day?

We’re continuing to make changes to adapt to new traffic patterns, use cases, security requirements, and other use-driven demands of our network infrastructure. We’ll continue to transform our network, embracing new WAN and edge security models associated with user identities, devices, and applications. Software-defined infrastructure and processes will bring the network and application security to the places where end users can consume them when needed, regardless of where they’re physically located, thus providing a more agile, more secure, and more effective network environment throughout Microsoft.

The post Transforming Microsoft’s enterprise network with next-generation connectivity appeared first on Inside Track Blog.

Microsoft extends Azure management to the private cloud with Azure Arc

Serah Delaini — Wed, 06 Mar 2024 17:00:43 +0000

When Microsoft began adopting cloud server technology internally in 2014, it operated some 60,000 on-premises servers and 2,000 line-of-business applications. These assets, normally managed by the team or individual that purchased or built them, were vital to company operations.

Now Microsoft Digital is running the majority of the company’s servers, internal apps, and business processes on Microsoft Azure in the cloud.

That move significantly reduced the volume of servers needed on site and made it easier to track the costs associated with running each service. It also helped IT administrators and developers apply standard risk-management policies and other best practices for network and data security.

But in some scenarios, Microsoft teams still use physical servers to fulfill specific needs.

Why is on-premises hardware still needed?

“Sometimes it’s the kind of app a team is using,” says Dana Baxter, a principal service engineer in Microsoft Digital’s Manageability division. “They might not have the infrastructure to move it to the cloud. There might be dependencies on systems that are not yet migrated.”

If a service is not going to be used for the long term, it is often not worth the effort of decommissioning, redesigning, and redeploying it. A handful of Microsoft groups also maintain on-premises servers because they require extremely high-speed direct internet connections.

At Microsoft, the Manageability Platforms team uses Microsoft System Center Configuration Manager (SCCM) for on-premises server management. In alignment with earlier IT design principles, SCCM covers only Windows servers, specifically those joined to a domain and assigned to the correct organizational unit (OU).

As Microsoft Digital began using more Infrastructure as a Service (IaaS) features native to Azure, a gap grew between the tools used to manage on-premises infrastructure and those used to manage IaaS. With some 3,000 on-premises servers still running within the Microsoft network infrastructure as of early 2019, Baxter saw a significant opportunity to improve the security, cost accounting, and manageability of these computing assets.

Microsoft customers face similar issues.

“Sometimes it doesn’t make sense to lift and shift everything immediately,” says Jian Yan, principal program manager for Azure Arc for servers. “If the hardware is paid for and it’s not at end-of-life, then it’s an investment they’ve already made.”

Baxter wanted Microsoft IT administrators to be able to manage the servers and virtual machines (VMs) with the same ease as an Azure dashboard. Could there be a way to connect these assets to Azure?

“About a year ago, the manageability team started working with the Azure product group on a vision for how we could replicate the functionality of SCCM using Azure features,” Baxter says.

The cross-group team was especially interested in supporting software deployment and collecting data for configuration settings across the organization. Another goal was to improve anti-malware measures for Microsoft Digital’s entire hybrid environment with a unified set of Azure features.

The team immediately realized many issues could be prevented or overcome by including on-premises servers in the Azure management tools. They decided to develop Azure solutions to cover the multi-OS platform and hybrid environments rather than expand usage of SCCM capabilities.

Enter Azure Arc, an extension of Azure Resources Manager, now in public preview. The service brings Azure features that are typically available only in the public cloud to private and on-premises workspaces, including those that are using non-Microsoft cloud services. It contains Azure Arc for server and Kubernetes management, and Azure data services.

With Azure Arc, IT administrators can use the Azure Control Plane to collect and view system data from any environment (on-premises, Azure) or platform (Windows, Linux). When the assets are visible to Azure, it is much easier to apply standard security policies and gather relevant information from each with automated cloud services.

Who is going to use Azure Arc on their servers?

“There are many use cases for Azure Arc. It gives us an opportunity to streamline and reduce the tools we use to manage infrastructure,” Baxter says. “For example, at the Azure Control Plane level, we now have a framework enabling enterprise IT security and governance admins to apply Azure policies at scale.”

For customers, Azure Arc for servers could help IT manage assets across more than one cloud provider. The service enables administration of non-Azure cloud servers alongside Azure assets.

“Users are going to different clouds to acquire their data,” Yan says. “It puts IT in a very difficult position. They need a way to consolidate all these different pieces and standardize across the organization.”

[Learn more about Microsoft’s cloud centric transformation, find out how the company adopted Azure monitor, and discover what principles to keep in mind when implementing modern engineering.]

Reports from early adopters

Microsoft Digital is now in the process of deploying Azure Arc for servers at Microsoft, beginning with the Managed Workspaces team. The rollout has just started, with roughly 10 percent of formerly isolated Microsoft servers and VMs becoming visible to Azure within the past few weeks.

Now, all Microsoft teams can implement enterprise-wide governance programs like management groups and policies that protect the entire company.

The use of Azure services is strategically important for both Azure and Microsoft.

“Azure Arc, Guest Configuration policy, Azure Policy, and Management Groups together allow seamless governance and management of on-premises and multi-cloud resources with a single control plane,” Baxter says.

Heathcliff Anderson, a service engineer in the Managed Workspaces group, was one of the first to try out the tool.

“We started slowly by rolling out the agent on machines one by one. Azure has a nice prescriptive guide on the website on how to do that installation,” Anderson says.

The team soon discovered that there was a point in the process of registering with Azure that required an IT admin to visit a website and manually enter a code. By using the Service Principal Name feature in Azure, Anderson was able to quickly develop a PowerShell script to complete this user action with automation.

“After testing the script on one or two machines, we launched the job through SCCM, running the script against about 100 servers at first. It took about 10 minutes,” Anderson says.

Today, the Managed Workspaces team has activated Azure Arc on more than 300 virtual and physical production servers and is running it with no issues. The servers now automatically receive and implement Azure Policies from the central governance teams in alignment with Azure IaaS systems.

Manasi Choudhari, a program manager in the Managed Workspaces group, is pleased with the benefits that the extension has delivered so far. The next step is to reduce the volume of manual IT administration for the Managed Workspace team.

“We hope to use the Azure extension for automation around deploying scripts that are needed for on-prem servers,” Choudhari says “It is very early, but these are very good features for us to explore.”

Other Microsoft teams also see the value of Azure Arc for servers.

“Tracking costs associated with on-prem servers has always been a difficult thing to do,” says Jeromy Statia, a principal software engineer responsible for securing the Windows Build pipeline. “We want to understand our resources and how they contribute to our services cost. An Azure subscription owner is very clear and defined. We know the cost of a service and who to go to when the server is not acting appropriately.”

Security policies are also easier to enforce with Azure Arc.

“In Azure, there’s this managed service identity that makes an app developer’s security management very easy,” Statia says. “It solves some of the worst practices and encourages best practices instead.”

The Managed Workspace team was able to provide specific product development input based on their experience so far with Azure Arc.

“The problem we’ve presented back to the product group,” Baxter says, “is right now, everyone has to download the package and connect it manually. How do we build this into the product so it’s set by default? How do we build the VM so it already has Arc Agent on it? We are asking the product team to make the agent more integrated.”

What’s next for Azure Arc for servers?

Having completed their initial rollout, the Managed Workspace team is anticipating the release of new Azure Arc capabilities.

“As extensions become available, we’ll run those and pilot those with the various groups,” Anderson says. “If we have any kind of configuration management policy changes that go out, now all of our security policies can be managed from Azure.”

Statia is especially looking forward to using Azure to support certificate auto-renewal. An Azure Key Vault Certificate Deployment extension (currently in Private Preview) keeps the certificate on any machine up to date.

“The reason I latched onto Arc Agent early was what I call the ‘bootstrap credential problem,’” Statia says. “Interacting with Azure always requires a pre-existing certificate. If you don’t already have a certificate, you need another method to get it.”

This could create a problem for users and require IT administrators to manually manage the certificates.

“With Azure,” Statia says, “I will no longer have to manage that credential for an on-premises server. We can use all the value-add of Azure in a standards-based way—soon, without having to worry about storing certificates with personal information exchange (PFX) files, the password that is managing PFX, or the deployment of the PFX package.”

In the future, the Azure product team plans to develop further inventory functionality for Azure Arc.

“The Manageability Platforms teams at Microsoft is creating an Azure-based Inventory solution, co-developed with the Azure product group, to replace our SCCM infrastructure,” Baxter says. “This will give us greater coverage and increase the breadth of data points we are able to collect.”

But this is just the beginning for Azure Arc.

“This is really an early stage of our journey,” Baxter says. “We are looking at expanding Azure Arc capabilities to leverage Azure Policy more widely.”

The team is also starting to support system configuration data collection across the entire Microsoft Digital environment for servers.

“The focus right now is around creating the foundation,” Baxter says. “We want to manage all our servers from Azure, so we can use the same tools for enterprise security and governance programs regardless of the asset’s location or operating system.”

Discover more about Azure Arc from the Microsoft Azure product group, including about About Azure Arc, Azure Arc for servers, and Azure’s Cloud Adoption Framework.

The post Microsoft extends Azure management to the private cloud with Azure Arc appeared first on Inside Track Blog.

Enhancing VPN performance at Microsoft

Inside Track staff — Thu, 11 Jan 2024 17:00:13 +0000

Modern workers are increasingly mobile and require the flexibility to get work done outside of the office. Here at Microsoft headquarters in the Puget Sound area of Washington State, every weekday an average of 45,000 to 55,000 Microsoft employees use a virtual private network (VPN) connection to remotely connect to the corporate network. As part of our overall Zero Trust Strategy, we have redesigned our VPN infrastructure, something that has simplified our design and let us consolidate our access points. This has enabled us to increase capacity and reliability, while also reducing reliance on VPN by moving services and applications to the cloud.

Providing a seamless remote access experience

Remote access at Microsoft is reliant on the VPN client, our VPN infrastructure, and public cloud services. We have had several iterative designs of the VPN service inside Microsoft. Regional weather events in the past required large increases in employees working from home, heavily taxing the VPN infrastructure and requiring a completely new design. Three years ago, we built an entirely new VPN infrastructure, a hybrid design, using Microsoft Azure Active Directory (Azure AD) load balancing and identity services with gateway appliances across our global sites.

Key to our success in the remote access experience was our decision to deploy a split-tunneled configuration for the majority of employees. We have migrated nearly 100% of previously on-premises resources into Microsoft Azure and Microsoft Office 365. Our continued efforts in application modernization are reducing the traffic on our private corporate networks as cloud-native architectures allow direct internet connections. The shift to internet-accessable applications and a split-tunneled VPN design has dramatically reduced the load on VPN servers in most areas of the world.

Using VPN profiles to improve the user experience

We use Microsoft Endpoint Manager to manage our domain-joined and Microsoft Azure AD–joined computers and mobile devices that have enrolled in the service. In our configuration, VPN profiles are replicated through Microsoft Intune and applied to enrolled devices; these include certificate issuance that we create in Configuration Manager for Windows 10 devices. We support Mac and Linux device VPN connectivity with a third-party client using SAML-based authentication.

We use certificate-based authentication (public key infrastructure, or PKI) and multi‑factor authentication solutions. When employees first use the Auto-On VPN connection profile, they are prompted to authenticate strongly. Our VPN infrastructure supports Windows Hello for Business and Multi-Factor Authentication. It stores a cryptographically protected certificate upon successful authentication that allows for either persistent or automatic connection.

For more information about how we use Microsoft Intune and Endpoint Manager as part of our device management strategy, see Managing Windows 10 devices with Microsoft Intune.

Configuring and installing VPN connection profiles

We created VPN profiles that contain all the information a device requires to connect to the corporate network, including the supported authentication methods and the VPN gateways that the device should connect to. We created the connection profiles for domain-joined and Microsoft Intune–managed devices using Microsoft Endpoint Manager.

For more information about creating VPN profiles, see VPN profiles in Configuration Manager and How to Create VPN Profiles in Configuration Manager.

The Microsoft Intune custom profile for Intune-managed devices uses Open Mobile Alliance Uniform Resource Identifier (OMA-URI) settings with XML data type, as illustrated below.

Creating a Profile XML and editing the OMA-URI settings to create a connection profile in System Center Configuration Manager.

Installing the VPN connection profile

The VPN connection profile is installed using a script on domain-joined computers running Windows 10, through a policy in Endpoint Manager.

For more information about how we use Microsoft Intune as part of our mobile device management strategy, see Mobile device management at Microsoft.

Conditional Access

We use an optional feature that checks the device health and corporate policies before allowing it to connect. Conditional Access is supported with connection profiles, and we’ve started using this feature in our environment.

Rather than just relying on the managed device certificate for a “pass” or “fail” for VPN connection, Conditional Access places machines in a quarantined state while checking for the latest required security updates and antivirus definitions to help ensure that the system isn’t introducing risk. On every connection attempt, the system health check looks for a certificate that the device is still compliant with corporate policy.

Certificate and device enrollment

We use an Azure AD certificate for single sign-on to the VPN connection profile. And we currently use Simple Certificate Enrollment Protocol (SCEP) and Network Device Enrollment Service (NDES) to deploy certificates to our mobile devices via Microsoft Endpoint Manager. The SCEP certificate we use is for wireless and VPN. NDES allows software on routers and other network devices running without domain credentials to obtain certificates based on the SCEP.

NDES performs the following functions:

It generates and provides one-time enrollment passwords to administrators.
It submits enrollment requests to the certificate authority (CA).
It retrieves enrolled certificates from the CA and forwards them to the network device.

For more information about deploying NDES, including best practices, see Securing and Hardening Network Device Enrollment Service for Microsoft Intune and System Center Configuration Manager.

VPN client connection flow

The diagram below illustrates the VPN client-side connection flow.

The client-side VPN connection flow.

When a device-compliance–enabled VPN connection profile is triggered (either manually or automatically):

The VPN client calls into the Windows 10 Azure AD Token Broker on the local device and identifies itself as a VPN client.
The Azure AD Token Broker authenticates to Azure AD and provides it with information about the device trying to connect. A device check is performed by Azure AD to determine whether the device complies with our VPN policies.
If the device is compliant, Azure AD requests a short-lived certificate. If the device isn’t compliant, we perform remediation steps.
Azure AD pushes down a short-lived certificate to the Certificate Store via the Token Broker. The Token Broker then returns control back over to the VPN client for further connection processing.
The VPN client uses the Azure AD–issued certificate to authenticate with the VPN gateway.

Remote access infrastructure

At Microsoft, we have designed and deployed a hybrid infrastructure to provide remote access for all the supported operating systems—using Azure for load balancing and identity services and specialized VPN appliances. We had several considerations when designing the platform:

Redundancy. The service needed to be highly resilient so that it could continue to operate if a single appliance, site, or even large region failed.
Capacity. As a worldwide service meant to be used by the entire company and to handle the expected growth of VPN, the solution had to be sized with enough capacity to handle 200,000 concurrent VPN sessions.
Homogenized site configuration. A standard hardware and configuration stamp was a necessity both for initial deployment and operational simplicity.
Central management and monitoring. We ensured end-to-end visibility through centralized data stores and reporting.
Azure AD–based authentication. We moved away from on-premises Active Directory and used Azure AD to authenticate and authorize users.
Multi-device support. We had to build a service that could be used by as much of the ecosystem as possible, including Windows, OSX, Linux, and appliances.
Automation. Being able to programmatically administer the service was critical. It needed to work with existing automation and monitoring tools.

When we were designing the VPN topology, we considered the location of the resources that employees were accessing when they were connected to the corporate network. If most of the connections from employees at a remote site were to resources located in central datacenters, more consideration was given to bandwidth availability and connection health between that remote site and the destination. In some cases, additional network bandwidth infrastructure has been deployed as needed. The illustration below provides an overview of our remote access infrastructure.

Microsoft remote access infrastructure.

VPN tunnel types

Our VPN solution provides network transport over Secure Sockets Layer (SSL). The VPN appliances force Transport Layer Security (TLS) 1.2 for SSL session initiation, and the strongest possible cipher suite negotiated is used for the VPN tunnel encryption. We use several tunnel configurations depending on the locations of users and level of security needed.

Split tunneling

Split tunneling allows only the traffic destined for the Microsoft corporate network to be routed through the VPN tunnel, and all internet traffic goes directly through the internet without traversing the VPN tunnel or infrastructure. Our migration to Office 365 and Azure has dramatically reduced the need for connections to the corporate network. We rely on the security controls of applications hosted in Azure and services of Office 365 to help secure this traffic. For end point protection, we use Microsoft Defender Advanced Threat Protection on all clients. In our VPN connection profile, split tunneling is enabled by default and used by the majority of Microsoft employees. Learn more about Office 365 split tunnel configuration.

Full tunneling

Full tunneling routes and encrypts all traffic through the VPN. There are some countries and business requirements that make full tunneling necessary. This is accomplished by running a distinct VPN configuration on the same infrastructure as the rest of the VPN service. A separate VPN profile is pushed to the clients who require it, and this profile points to the full-tunnel gateways.

Full tunnel with high security

Our IT employees and some developers access company infrastructure or extremely sensitive data. These users are given Privileged Access Workstations, which are secured, limited, and connect to a separate highly controlled infrastructure.

Applying and enforcing policies

In Microsoft Digital, the Conditional Access administrator is responsible for defining the VPN Compliance Policy for domain-joined Windows 10 desktops, including enterprise laptops and tablets, within the Microsoft Azure Portal administrative experience. This policy is then published so that the enforcement of the applied policy can be managed through Microsoft Endpoint Manager. Microsoft Endpoint Manager provides policy enforcement, as well as certificate enrollment and deployment, on behalf of the client device.

For more information about policies, see VPN and Conditional Access.

Early adopters help validate new policies

With every new Windows 10 update, we rolled out a pre-release version to a group of about 15,000 early adopters a few months before its release. Early adopters validated the new credential functionality and used remote access connection scenarios to provide valuable feedback that we could take back to the product development team. Using early adopters helped validate and improve features and functionality, influenced how we prepared for the broader deployment across Microsoft, and helped us prepare support channels for the types of issues that employees might experience.

Measuring service health

We measure many aspects of the VPN service and report on the number of unique users that connect every month, the number of daily users, and the duration of connections. We have invested heavily in telemetry and automation throughout the Microsoft network environment. Telemetry allows for data-driven decisions in making infrastructure investments and identifying potential bandwidth issues ahead of saturation.

Using Power BI to customize operational insight dashboards

Our service health reporting is centralized using Power BI dashboards to display consolidated data views of VPN performance. Data is aggregated into an SQL Azure data warehouse from VPN appliance logging, network device telemetry, and anonymized device performance data. These dashboards, shown in the next two graphics below, are tailored for the teams using them.

Global VPN status dashboard.

Microsoft Power BI reporting dashboards.

With our optimizations in VPN connection profiles and improvements in the infrastructure, we have seen significant benefits:

Reduced VPN requirements. By moving to cloud-based services and applications and implementing split tunneling configurations, we have dramatically reduced our reliance on VPN connections for many users at Microsoft.
Auto-connection for improved user experience. The VPN connection profile automatically configured for connection and authentication types have improved mobile productivity. They also improve the user experience by providing employees the option to stay connected to VPN—without additional interaction after signing in.
Increased capacity and reliability. Reducing the quantity of VPN sites and investing in dedicated VPN hardware has increased our capacity and reliability, now supporting over 500,000 simultaneous connections.
Service health visibility. By aggregating data sources and building a single pane of glass in Microsoft Power BI, we have visibility into every aspect of the VPN experience.

The post Enhancing VPN performance at Microsoft appeared first on Inside Track Blog.

Moving Microsoft’s global network to the cloud with Microsoft Azure

Lukas Velush — Fri, 05 Jan 2024 21:17:24 +0000

Microsoft Azure has been part of the enterprise solution architecture at Microsoft for more than eight years. One thing has remained constant throughout our journey—from early lift-and-shift migrations to recent transformations to cloud-first solutions—the network.

“In the early stages, migrating on-premises resources into Azure was our priority, and low-bandwidth dedicated links provided connectivity between on-premises networks and Azure,” says Raghavendran Venkatraman, a principal cloud network engineer at Microsoft Digital (MSD), Microsoft’s internal IT organization. “Over time, these links evolved into high-bandwidth shared connections, providing greater flexibility and capacity.”

As new networking features were released on Azure, the Microsoft Digital cloud networking team embraced these innovations with enthusiasm and a Customer Zero mindset. This approach led to a continuing transformation of our network architecture and an ongoing partnership with the Azure product team.

Raghavendran Venkatraman is a principal cloud network engineering manager in Microsoft Digital, the company’s IT organization. His team is leading the cloud networking transformation at Microsoft.

We transitioned to a high-bandwidth model to support our connectivity needs between Azure and on-premises resources, integrating with the native Azure security features. That gave us a robust framework that reduced our reliance on on-premises hardware and third-party devices.

Now, we find ourselves at an inflection point in this journey.

Our line-of-business applications have successfully transitioned to Azure. Our product development environments, previously exclusively on-premises, have matured into hybrid configurations that seamlessly blend on-premises and cloud resources. Additionally, many of our labs have moved to the cloud. Almost 98 percent of Microsoft’s IT infrastructure is hosted in Azure.

However, we need to go further back to understand the complete story.

Long before we deployed our very first Azure tenant or created a virtual network, the Microsoft global network had its humble beginnings more than 40 years ago, supporting connectivity for a handful of employees in a single building in Redmond, Washington.

Our global network has since grown to include more than 180,000 employees working in more than 180 countries and regions worldwide. Our global network is critical for our business operations and is at the center of our architecture design, engineering principles, and security posture. This global network connects our offices and data centers and has been our employees’ launching pad from the corporate network to the cloud.

“There is a critical facet of our organization’s network that has yet to embrace the cloud’s transformative capabilities fully,” Venkatraman says. “Our global network and enterprise services still depend on third-party solutions. These services include vital components such as DNS, remote access, internet edge, and connectivity between our regional Microsoft locations.”

Migrating our enterprise network services to the cloud supports the shift toward modern, agile IT operations. It enables us to respond swiftly to the changing demands of our users and the technological landscape. Using Azure helps us future-proof our infrastructure, ensuring it remains adaptable and resilient in the face of ongoing change.

—Raghavendran Venkatraman, principal cloud network engineer, Microsoft Digital

[Read our ongoing series on moving our network to the cloud.]

Why move global network connectivity to the cloud?

We’re migrating these essential global network services to the cloud. This shift aligns our network architecture with Microsoft’s cloud-first mindset. It enables our network engineers to use the extensive capabilities of Azure, offering greater agility, scalability, and resilience for our network and services.

The journey to migrating these enterprise services isn’t just about technology evolution. It’s about aligning our infrastructure with our vision for the future. It’s about harnessing the power of the cloud to usher in a new era of efficiency, security, and agility at Microsoft.

“Migrating our enterprise network services to the cloud supports the shift toward modern, agile IT operations,” Venkatraman says. “It enables us to respond swiftly to the changing demands of our users and the technological landscape. Using Azure helps us future-proof our infrastructure, ensuring it remains adaptable and resilient in the face of ongoing change.”

Azure offers a comprehensive array of defense-in-depth security features and services, including built-in encryption, DDoS protection, Microsoft Defender for Cloud, network security groups, application security groups, and secure secrets management with Azure Key Vault. Our migration ensures that we continue to meet the highest standards of security and data protection, a critical aspect of our operational excellence.

There are several compelling advantages to embracing Azure as a core network provider. It provides unmatched scalability, high reliability, and exceptional agility. These factors contribute to building a cost-efficient infrastructure that can adapt to our evolving needs.

—Raghavendran Venkatraman, principal cloud network engineer, Microsoft Digital

Azure offers more than 60 regions worldwide to deploy and host Azure resources. These regions are connected by a resilient backbone network connecting continents, regions, and cities. It offers a comprehensive suite of features to support enterprise network operations in the cloud.

The primary directive of our migration to the cloud is to transition our global enterprise network traffic from third-party and on-premises network resources to the global Azure backbone, taking advantage of the vast array of benefits that the Azure backbone network offers our workloads.

Connecting and supporting Microsoft’s global network with Microsoft Azure.

“There are several compelling advantages to embracing Azure as a core network provider,” Venkatraman says. “It provides unmatched scalability, high reliability, and exceptional agility. These factors contribute to building a cost-efficient infrastructure that can adapt to our evolving needs.”

Our shift to the cloud as our primary network represents an opportunity for us to harness the full potential of Azure, and it aligns seamlessly with our commitment to delivering efficient, reliable, and agile services, not just for our internal needs but also for our partners and customers.

By acting as Customer Zero and embracing these Azure features and network services for our core needs, we want to set new benchmarks for efficiency and performance and demonstrate the full extent of Azure’s capabilities.

How we’re migrating our network to Azure

Shifting Microsoft’s global network and enterprise services to Azure involves transforming and improving the paths that shape our network traffic flow. “We’re moving essential services such as DNS, remote access, and the internet edge out of on-premises and third-party solutions and into Azure-native services and functionality,” Venkatraman says.

We aim to create a more agile, resilient, and stable global virtual wide area network (VWAN) that supports all our enterprise traffic. By hosting our core network in Azure, we’re placing our employees as close as possible to the network and cloud resources they need.

Within our global VWAN, the vast majority of our employees will be transferred to a remote, internet-first connectivity method, making the internet their first connection point and placing them in close network proximity to the nearest Azure region, where most of our IT resources reside. Simultaneously, we’re transitioning regional offices to connect with our corporate environment directly through Azure, supplemented by a local internet edge. This replaces the conventional centralized edge for that region and creates a more efficient path to each location, improving efficiency and increasing performance.

We’re improving automation and agility by adopting software-defined networking practices natively available in the cloud and taking a continuous integration/continuous deployment (CI/CD) approach to building our VWAN-based network infrastructure. This results in quick and reliable delivery of changes to network services and enables us to match the increasing pace of technology change in the marketplace.

Understanding the benefits of an Azure-based global network

Transitioning our enterprise services to the cloud is a pivotal milestone in our ongoing journey to transform and enhance our network infrastructure and organization. This strategic shift offers remarkable advantages that profoundly impact our operations, scalability, and efficiency. These benefits include:

Highly available network infrastructure. By embracing the cloud, we gain access to a network infrastructure with built-in reliability and availability. This ensures seamless connectivity and service delivery to our employees and customers.
Data center footprint reduction. Our line-of-business applications have successfully migrated to virtual data centers hosted on the cloud. This evolution minimizes our reliance on traditional on-premises data centers and opens doors to a more agile and scalable approach.
Cloud-native enterprise services. We’re moving core enterprise services to the cloud, aligning our operations with the modern digital age. This transition streamlines our services, enhancing their efficiency and accessibility.
Maximized usage of cloud resources. As Azure continues to evolve and offer innovative features, our migration to the cloud allows us to capitalize on the full potential of these advancements, keeping us at the forefront of technological progress.
Strategic advancements and the seamless integration of Microsoft Entra. Azure networking increases our capability to migrate enterprise services to the cloud. This strategic movement includes integration with Microsoft Entra, which enables us to prioritize security. We’re using Entra integration to minimize public-facing exposure, exercise tight control over incoming traffic, and implement dynamic onboarding processes to deploy network services.
A reference architecture for our customers. The transition underlines our commitment to the cloud, providing a reference architecture that communicates Microsoft’s commitment to delivering enterprise-class products and using those products to run our own organization.
Cost-efficient infrastructure. Cloud migration empowers us to build an infrastructure that is not only cost-efficient but also highly agile and scalable. We can optimize resource utilization, ensuring we pay only for what we consume.
Reduced third-party dependency. As we bring more services in house through the cloud, we can optimize our reliance on third-party solutions. This consolidation enhances our control, security, and cost-effectiveness. One of the biggest benefits is that we will have less of a need to sign and be constrained by multi-year contracts with third-party providers.
Infrastructure that is secure by design. The cloud’s security features, combined with our robust in-house practices, create a secure-by-design infrastructure. This enhances the protection of our services and data.
Hybrid management possibilities. Our hybrid approach integrates the management of on-premises and cloud resources. This approach ensures a unified, efficient, and effective way of managing our entire infrastructure.
DevOps-integrated infrastructure as code (IaC). We’re embracing a DevOps culture and integrating IaC principles into our operations. This approach automates deployment and configuration, streamlining our workflows and ensuring rapid and reliable delivery of changes.
Built-in reliability and resiliency. The global Azure network provides a highly redundant backbone. By using this architecture, we enhance the reliability and availability of our global services without requiring extra management or deployment.
Enhanced scalability. The extensive bandwidth and capabilities of Azure provide enhanced scalability and position us strategically to drive AI innovation. Our network’s ability to rapidly adapt to varying workloads and accommodate future growth enables us to align with Azure Copilot capabilities. The natively available data telemetry enables us to integrate with Azure AI offerings, fostering an agile environment that keeps pace with the rapid evolution of AI innovation within our organization.

Moving forward

Moving our global enterprise network services to the cloud is a transformative move that aligns with our mission to optimize the full potential of Azure and embrace modern, cloud-native practices.

“Transitioning enterprise services to the cloud is a transformative move that aligns with our mission to optimize the full potential of Azure and embrace modern, cloud-native practices,” Venkatraman says. “This transition represents a major step toward a more efficient, scalable, and secure future, embodying our commitment to innovation and technological excellence.”

As we migrate our global enterprise network services to Azure, we’re continually examining and integrating newly released Azure capabilities. This approach supports our vision for combining efficiency, resilience, and agility to enable our employees and organization to achieve more. It sets the stage for a future in which our network and services are more adaptable, efficient, and secure than ever before.

Here are some tips for getting started on moving your network to the cloud:

Embrace cloud transition proactively. Assessing and acting on the potential of cloud infrastructure can lead to increased agility, scalability, and cost-effectiveness.
Prioritize security and compliance. A secure-by-design infrastructure is vital for protecting services and data and maintaining stakeholder trust.
Adopt a hybrid management approach. A hybrid configuration that blends on-premises and cloud resources offers a unified, efficient way of managing infrastructure, balancing the strengths of both environments.
Integrate DevOps and automation practices. Continuous integration/continuous deployment (CI/CD) and infrastructure as code (IaC) principles streamline workflows, ensuring rapid and reliable delivery of changes and optimizing resource deployment.
Stay updated and adapt. As cloud platforms evolve, re-evaluate and adjust your cloud strategy to remain at the forefront of technological progress.

Simplify your moving your network to Azure with Microsoft Azure Migrate.

Try creating and modifying a circuit with Microsoft Azure ExpressRoute.

Read our ongoing series on moving our network to the cloud.

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Moving Microsoft’s global network to the cloud with Microsoft Azure appeared first on Inside Track Blog.

Boosting employee connectivity with Microsoft Azure-based VWAN architecture

Beth Garrison — Fri, 27 Oct 2023 00:01:03 +0000

Editor’s note: This is the fourth in our ongoing series on moving our network to the cloud internally at Microsoft.

Whether our employees are in neighboring cities or different continents, they need to communicate and collaborate efficiently with each other. We designed our Microsoft Azure-based virtual wide-area network (VWAN) architecture to provide high-performance networking across our global presence, enabling reliable and security-focused connectivity for all Microsoft employees, wherever they are.

We’re using Azure to strategically position enterprise services such as the campus internet edge in closer proximity to end users and improve network performance. These performance improvements are streamlining our site connectivity worldwide and improving the user experience, increasing user satisfaction and operational efficiency.

We’ve recently piloted this VWAN architecture with our Microsoft Johannesburg office. Our users in Johannesburg were experiencing latency issues and sub-optimal network performance relating to outbound internet connections routed through London and Dublin in Europe. In other words, employees had to go to another continent in order to reach the internet.

To simplify the network path for outgoing internet traffic and reduce latency, we migrated outbound traffic for two network segments in Johannesburg to the Azure Edge using a VWAN connected through Azure ExpressRoute circuits.

The solution relocates the internet edge for Johannesburg to the South Africa North region datacenter in South Africa, using Azure Firewall, Azure ExpressRoute, Azure Connection Monitor, and Azure VWAN. We’ve also evolved our DNS resolution strategy to a hybrid solution that hosts DNS services in Azure, which increases our scalability and resiliency on DNS resolution services for Johannesburg users. We’ve deployed the entire solution adhering to our infrastructure as code strategy, creating a flexible network infrastructure that can adapt and scale to evolving demands on the VWAN.

We’re using Azure Network Watcher connection monitor and Broadcom AppNeta to monitor the entire solution end-to-end. These tools will be critical in evaluating the VWAN’s performance, enabling data-driven decisions for optimizing network performance.

The accompanying high-level diagram outlines our updated network flows. We can support distinct user groups by isolating the guest virtual route forwarding zone (red lines) and the internet virtual route forwarding zone (black lines). This design underscores our commitment to robust outbound traffic control, ensuring a secure and optimized network environment.

Creating efficient and isolated traffic routing to the internet with Azure-based VWAN architecture.

Beth Garrison is a cloud software engineer and part of the team that is helping build and maintain Microsoft Digital’s network using infrastructure as code.

We strongly believe our VWAN-based architecture represents the future of global connectivity. The agility, scalability, and resiliency of VWAN infrastructure enables increased collaboration, productivity, and efficiency across our regional offices.

Our pilot in Johannesburg proved that improvements in network performance directly affected user experience. By relocating the network edge to the South Africa region in Azure instead of our datacenter edge in London/Dublin, latency for connections from Johannesburg to other public endpoints in South Africa has dropped from 170 milliseconds to 1.3 milliseconds.

Latency for other network paths has also improved, but by lesser amounts depending on the specific destination. The improvements were always greater the closer the destination was to Johannesburg, including connectivity paths to the United States and Europe, demonstrating stability and reliability in these critical connections. Significant benefits of the VWAN solution include:

Increased scalability and flexibility. Our architecture is built to scale with our business needs. Whether we have a handful of regional buildings or a continent, the VWAN solution can accommodate any dynamic growth pattern. As our service offering expands, we can easily add new locations and integrate them seamlessly into the VWAN infrastructure.
Greater network resilience. Continuous connectivity is essential to effective productivity and collaboration. Our architecture incorporates redundancy and failover mechanisms to ensure network resilience. In case of a network disruption or hardware failure, the VWAN solution automatically reroutes traffic to alternative paths, minimizing downtime and maintaining uninterrupted communication.
Improved security and compliance. Protecting our data and ensuring compliance is our top priority. Our VWAN-based architecture is secure by design that incorporates industry-leading security measures, including encryption, network segmentation, and access controls. We adhere to the highest security standards that help Microsoft safeguard sensitive information in transit and meet compliance requirements.

We’re currently planning our VWAN-based architecture to span multiple global regions, offering extensive coverage and enabling our employees to connect to their regional and global services through the Azure network backbone as we continue prioritizing network performance to deliver exceptional connectivity for voice, data, and other critical applications.

We’re working to build improvements into the architecture for more optimized routing, improved Quality of Service (QoS) mechanisms, and advanced traffic management techniques to minimize latency, packet loss, and jitter, ensuring robust and low-latency connections to facilitate seamless communication regardless of where our employees are located.

Contact us today to explore how our cutting-edge VWAN-based architecture can transform your organization’s networking capabilities and revolutionize how your employees connect and communicate globally. Email us and include a link to this story and we’ll get back to you with more information.

Assess your organization’s current network performance and needs to understand the challenges remote employees and satellite offices face regarding latency and connectivity.
Incorporate Microsoft Azure for improved scalability, flexibility, and resilience so you can strategically position cloud services near end users, improving latency and overall user experience.
Adopt an infrastructure-as-code approach to deploy flexible virtual network infrastructures. This streamlines the deployment process and ensures adaptability to ever-changing network demands.
Invest in monitoring tools to gain valuable insights into the VWAN’s performance, which will help you make data-driven decisions for optimization.
Adopt a VWAN-based architecture that emphasizes security measures such as encryption, network segmentation, and strict access controls. Ensure that the architecture adheres to the highest security standards, safeguarding sensitive information and meeting compliance requirements.
Keep updated on advancements in network routing, Quality of Service mechanisms, and traffic management techniques. This will help you minimize latency and ensure robust, low-latency connections, enhancing global communication for your employees.

Get started at our company by learning how to deploy Azure VWAN with routing intent and routing policies.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Boosting employee connectivity with Microsoft Azure-based VWAN architecture appeared first on Inside Track Blog.

How we’re deploying our VWAN infrastructure using infrastructure as code and CI/CD

Eric Scheffler — Fri, 22 Sep 2023 20:48:18 +0000

Editor’s note: This is the first in an ongoing series on moving our network to the cloud internally at Microsoft.

We’re building a more agile, resilient, and stable virtual wide-area network (VWAN) to create a better experience for our employees to connect and collaborate globally. By implementing a continuous integration/continuous deployment (CI/CD) approach to building our VWAN-based network infrastructure, we can automate the deployment and configuration processes to ensure rapid and reliable delivery of network changes. Here’s how we’re making that happen internally at Microsoft.

Infrastructure as code (IaC)

Juan Jimenez (left) and Eric Scheffler are part of the team in Microsoft Digital Employee Experience that is helping the company move its network to the cloud. Jimenez is a principle cloud network engineer and Scheffler is a senior cloud network engineer.

Infrastructure as code (IaC) is the fundamental principle underlying our entire VWAN infrastructure. Using IaC, we can develop and implement a descriptive model that defines and deploys VWAN components and determines how the components work together. IaC allows us to create and manage a massive network infrastructure with reusable, flexible, and rapid code deployments.

We created deployment templates and resource modules using the Bicep language in our implementation. These templates and modules describe the desired state of our VWAN infrastructure in a declarative manner. Bicep is a domain-specific language (DSL) that uses declarative syntax to deploy Microsoft Azure resources.

We maintain a primary Bicep template that calls separate modules—also maintained in Bicep templates—to create the desired resources for the deployment in alignment with Microsoft best practices. We use this modular approach to apply different deployment patterns to accommodate changes or new requirements.

With IaC, changes and redeployments are as quick as modifying templates and calling the associated modules. Additionally, parameters for each unique deployment are maintained in separate files from the templates so that different iterations of the same deployment pattern can be deployed without changing the source Bicep code.

Version control

We use Microsoft Azure DevOps, a source control system using Git, to track and manage our IaC templates, modules, and associated parameter files. With Azure DevOps, we can maintain a history of changes, collaborate within teams, and easily roll back to previous versions if necessary.

We’re also using pull requests to help track change ownership. Azure DevOps tracks changes and associates them with the engineer who made the change. Azure DevOps is a considerable help with several other version control tasks, such as requiring peer reviews and approvals before code is committed to the main branch. Our code artifacts are published to (and consumed from) a Microsoft Azure Container Registry that allows role-based access control of modules. This enables version control throughout the module lifecycle, and it’s easy to share Azure Container Registry artifacts across multiple teams for collaboration.

Automated testing

Responsible deployment is essential with IaC when deploying a set of templates could radically alter critical network infrastructure. We’ve implemented safeguards and tests to validate the correctness and functionality of our code before deployment. These tests include executing the Bicep linter as part of the Azure DevOps deployment pipeline to ensure that all Bicep best practices are being followed and to find potential issues that could cause a deployment to fail.

We’re also running a test deployment to preview the proposed resource changes before the final deployment. As the process matures, we plan to integrate more testing, including network connectivity tests, security checks, performance benchmarks, and enterprise IP address management (IPAM) integration.

Configuration management

Azure DevOps and Bicep allow us to automate the configuration and provisioning of network objects and services within our VWAN infrastructure. These tools make it easy to define and enforce desired configurations and deployment patterns to ensure consistency across different network environments. Using separate parameter files, we can rapidly deploy new environments in minutes rather than hours without changing the deployment templates or signing in to the Microsoft Azure Portal.

Continuous deployment

The continuous integration (CI) pipeline automates the deployment process for our VWAN infrastructure when the infrastructure code passes all validation and tests. The CI pipeline triggers the deployment process automatically, which might involve deploying virtual machines, building and configuring cloud network objects, setting up VPN connections, or establishing network policies.

Monitoring and observability

We’ve implemented robust monitoring and observability practices for how we deploy and manage our VWAN deployment. Monitoring and observability are helping us to ensure that our CI builds are successful, detect issues promptly, and maintain the health of our development process. Here’s how we’re building monitoring and observability in our Azure DevOps CI pipeline:

We’re creating built-in dashboards and reports that visualize pipeline status and metrics such as build success rates, durations, and failure details.
We’re generating and storing logs and artifacts during builds.
We’ve enabled real-time notifications to help us monitor build status for failures and critical events.
We’re building-in pipeline monitoring review processes to identify areas for improvement including optimizing build times, reducing failures, and enhancing the stability of our pipeline.

We’re continuing to iterate and optimize our monitoring practices. We’ve created a feedback loop to review the results of our monitoring. This feedback provides the information we need to adjust build scripts, optimize dependencies, automate certain tasks, and further enhance our pipeline.

By implementing comprehensive monitoring and observability practices in our Azure DevOps CI pipeline, we can maintain a healthy development process, catch issues early, and continuously improve the quality of our code and builds.

Rollback and rollforward

We’ve built the ability to rollback or rollforward changes in case of any issues or unexpected outcomes. This is achieved through infrastructure snapshots, version-controlled configuration files, or using features provided by our IaC tool.

Improving through iteration

We’re continuously improving our VWAN infrastructure using information from monitoring data and user experience feedback. We’re also continually assessing new requirements, newly added Azure features, and operational insights. We iterate on our infrastructure code and configuration to enhance security, performance, and reliability.

By following these steps and using CI/CD practices, we can build, test, and deploy our VWAN network infrastructure in a controlled and automated manner, creating a better employee experience by ensuring faster delivery, increased stability, and more effortless scalability.

Here are some tips on how you can start tackling some of the same challenges at your company:

You can use Infrastructure as code (IaC) to create and manage a massive network infrastructure with reusable, flexible, and rapid code deployments.
Using IaC, you can make changes and redeployments quickly by modifying templates and calling the associated modules.
Don’t overlook version control. Tracking and managing IaC templates, modules, and associated parameter files is essential.
Perform automated testing. It’s necessary to validate the correctness and functionality of the code before deployment.
Use configuration management tools to simplify defining and enforcing desired configurations and deployment patterns. This ensures consistency across different network environments.
Implement continuous deployment to automate the deployment process for network infrastructure after the code passes all validation and tests.
Use monitoring and observability best practices to help identify issues, track performance, troubleshoot problems, and ensure the health and availability of the network infrastructure.
Building rollback and roll-forward capabilities enables you to quickly respond to issues or unexpected outcomes.

Try using a Bicep template to manage your Microsoft Azure resources.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post How we’re deploying our VWAN infrastructure using infrastructure as code and CI/CD appeared first on Inside Track Blog.

Boosting our connectivity with our own next-generation optical network

Lukas Velush — Wed, 09 Aug 2023 18:40:46 +0000

Providing users with fast and reliable connectivity is the backbone of modern work.

Microsoft Digital Employee Experience, the organization that powers, protects, and transforms the company, recently deployed a robust next-generation optical network that offers the company more bandwidth on less fiber for a lower operational cost than leasing from traditional carriers.

“Our consumption costs and needs for employees keeps going up, especially at our company headquarters,” says Ragini Singh, a principal group engineering manager for Hybrid Cloud and Connectivity Services, a division within Microsoft Digital Employee Experience that provides connectivity services for the company. “We were purchasing a lot of network connectivity from third parties and reached a point where we saw an opportunity to emulate the high-density traffic of Azure.”

By moving away from leasing circuits from traditional carriers, Microsoft is able to better fulfill the connectivity requirements of its employees while also lowering costs. In less than two years, nearly $2 million in annual savings gained from owning our own optical network have already covered the cost of the expenditure. The next-generation optical network also raises the ceiling as to what is possible, both in terms of speeds and automation.

[Explore transforming Microsoft’s enterprise network with next-generation connectivity. Unpack how Microsoft extends Azure Management to the private cloud with Azure Arc. Discover how Microsoft kept its underwater datacenter connected while retrieving it from the ocean.]

When to build, when to buy

Previously, keeping Puget Sound—what we call our headquarters campus in Redmond, Washington—connected with high-density bandwidth meant spending millions annually with third-party carriers in the region. All of the equipment, circuits, nodes, and fiber would be provided and managed by the vendor, and our employees, labs, and datacenters would receive the connectivity required for productivity.

As our appetite for fast and reliable connectivity grew, so did the cost of procuring it from carriers.

“Increasing our bandwidth load used to be a process,” Singh says. “We had to negotiate for 6 to 12 months and then be bound by a contract. Changing that would necessitate further negotiations.”

By evaluating need, especially as our labs and traffic requirements grow, compared to the cost of leasing, we can now identify opportunities to build and take ownership of next-generation optical networks.

Puget Sound is one of the first campuses to experience this optical network transformation. Instead of being locked into contracts with recurring costs, including circuit fees, node fees, and management fees, we’re doing it in-house.

That means buying, installing, and managing all the gear; designing the framework; and creating the services to support the optical network.

The decision not only addresses the challenges posed by an aging legacy system but meets our demand for fast and reliable high-speed connectivity. Savings netted by removing recurring operational costs have already paid for the initial expenditure.

Controlling our own destiny

Building our own infrastructure does more than increase bandwidth and reduce costs, it also allows us to have more control and visibility over our optical networks. The Managed Control Path, deployed within Microsoft Azure, empowers us to manage, automate, and troubleshoot our optical network.

These devices have a 25-year lifespan; we don’t have to replace them for a long time. Our operational and troubleshooting tools are incredibly efficient. We’ve never had this level of visibility before.

—Blaine Martin, principal engineering manager, Hybrid Core Network Services, Microsoft Digital Employee Experience

If a circuit goes down, we can deal with it immediately instead of waiting for outside entities to respond. Automation can identify breaks with precision, update nodes simultaneously, and perform important tasks, like password rotations.

None of this was possible with vendor carriers.

“These devices have a 25-year lifespan; we don’t have to replace them for a long time,” says Blaine Martin, a principal engineering manager with Hybrid Core Network Services, the team within Microsoft Digital Employee Experience responsible for deploying the new optical network. “Our operational and troubleshooting tools are incredibly efficient. We’ve never had this level of visibility before.”

The team can also provision more bandwidth at an accelerated speed without having to renegotiate contracts or research performance and business metrics. What used to take months can now be completed in a day. As the architects, owners, and operators of the optical network, the team can activate new fiber lines immediately. Delivering new circuits to users can be done without engaging with finance or negotiating with carriers.

This means users get the connectivity they need faster than ever.

We have the ability to handle 400 Gbps of single client connections and beyond. This ensures that the optical network can meet the increasing demands for bandwidth, accommodating the evolving needs of Microsoft and its employees.

—Vinoth Elangovan, senior network engineer, Hybrid Core Network Services

Vinoth Elangovan led the effort to design and implement our optical network here at Microsoft.

The future of connectivity at Microsoft

Puget Sound’s new optical network offers our users fast and efficient connectivity at a significantly reduced cost, but it also makes the campus future ready. The new infrastructure can support incredibly high-speed traffic and was designed with scalability in mind.

“We have the ability to handle 400 Gbps of single client connections and beyond,” says Vinoth Elangovan, a senior network engineer from Hybrid Core Network Services. “This ensures that the optical network can meet the increasing demands for bandwidth, accommodating the evolving needs of Microsoft and its employees.”

The optical network also improves our security posture. In utilizing our own circuits, Microsoft eliminates the reliance on third-party carriers, ultimately reducing the risks associated with external providers.

Microsoft will continue to seek out opportunities to improve the employee experience.

Optical network transformation accelerates our ability to operate efficiently. We were tied down before. Labs had bottlenecks, but we’ve increased capacity tenfold. There has been lots of productivity gains because of that and teams can now focus on growing.

—Ragini Singh, principal group engineering manager, Hybrid Cloud and Connectivity Services

“There are times when high operational expenses necessitates building your own network,” Martin says. “What we spend once now comes back to us in savings. After that, it’s all gains.”

What started in Puget Sound is now being replicated across other Microsoft campuses, including Silicon Valley and Phoenix, with plans to expand to campuses internationally.

“Optical network transformation accelerates our ability to operate efficiently,” Singh says. “We were tied down before. Labs had bottlenecks, but we’ve increased capacity tenfold. There has been lots of productivity gains because of that and teams can now focus on growing.”

If you can justify the initial expenditures, having an optical network can enable companies to scale at higher rates and operate at lower costs.
Build your optical network with redundancy in mind. This will provide business continuity during a failure while any issues are being resolved.
Take the time to test operational systems and ensure that the right automations trigger ticket actions.
The increased visibility of an optical network centralizes your global traffic, allowing you to see how signals traverse. This allows you to implement other efficiencies, especially for long-haul amplification.

Learn about and start exploring how to make similar investments and find similar investments at your company by exploring our Microsoft Azure Virtual Network technology. You’ll need a subscription to Microsoft Azure—go here to sign up for a free trial.

The post Boosting our connectivity with our own next-generation optical network appeared first on Inside Track Blog.

Sharing how Microsoft now secures its network with a Zero Trust model

Jason Kellington — Thu, 27 Jul 2023 15:00:26 +0000

Editor’s note: We’ve republished this blog with a new companion video.

Safeguarding corporate resources is a high priority for any business, but how does Microsoft protect a network perimeter that extends to thousands of global endpoints accessing corporate data and services 24 hours a day, seven days a week?

It’s all about communication, collaboration, and expert knowledge.

Phil Suver, senior director of networking at Microsoft, and his team help champion Zero Trust networking, an important part of Microsoft’s broader Zero Trust initiative. Driven by Microsoft’s security organization, the Zero Trust model centers on strong user identity, device-health verification, application-health validation, and secure, least-privilege access to corporate resources and services.

“Zero Trust networking is ultimately about removing inherent trust from the network, from design to end use,” Suver says. “The network components are foundational to the framework for our Zero Trust model. It’s about revising our security approach to safeguard people, devices, apps, and data, wherever they’re located. The network is one piece. Identity and device health are additional pieces. Conditional access and permission are also required. That involves the entire organization.”

Indeed, this extensive initiative affects all of Microsoft and every employee. To support the Zero Trust initiative, Microsoft’s network engineering team is partnering with the security and end-user experience teams to implement security policy and identity while ensuring productivity for employees and partners. Suver says that his team always aims to minimize those impacts and communicate how they ultimately result in benefits.

“We’re fundamentally changing the way our network infrastructure has worked for more than two decades,” Suver says. “We’re moving away from internal private networks as the primary destination and toward the internet and cloud services as the new default. The security outcomes are the priority, but we have to balance that against business needs and productivity as well so that connectivity is transparent.”

That means being sensitive to how implementing Zero Trust networking affects users on a granular level, to ensure that employees don’t experience work stoppages or interruptions.

“Some of our efforts aren’t very disruptive, as they’re simply accelerating in a direction we were already heading,” Suver says. “Shifting to internet-first and wireless-first network design, and enabling remote work are examples of that. Others are indeed disruptive, so we work closely with the affected groups to help them understand the impact.”

Suver notes that understanding and communication are critical to avoiding disruption.

“Microsoft is an established enterprise, with some software and systems that have been in place for decades,” he says. “To run our business effectively, we must be able to accommodate processes and technology that might not be immediately ready to transition to a Zero Trust architecture.”

We’re able to be a little more opportunistic and aggressive with our in-building connectivity experiences while our user base is working remotely. This has allowed us to roll out configurations and learn things with a much smaller user population on-campus.

– Phil Suver, senior director of networking

Suver stresses the importance of working closely with affected employees. “We need to partner closely with our engineering teams to understand their connectivity requirements and build solutions around those for the short term and then broaden our scope for the longer term.”

With more employees working from home than ever due to COVID-19, many deployments in Microsoft buildings have been implemented relatively quickly and efficiently because of the decreased on-campus presence. Engineering teams can perform rollouts, including deploying new network segments, creating new wireless connections, and deploying network security policy with much less disruption than if buildings were fully occupied.

“We’re able to be a little more opportunistic and aggressive with our in-building connectivity experiences while our user base is working remotely,” Suver says. “This has allowed us to roll out configurations and learn things with a much smaller user population on-campus.”

[Check out these lessons learned and best practices from the Microsoft engineers who implemented Zero Trust networking. Find out what Microsoft’s leaders learned when they deployed Zero Trust networking internally at the company. Read Brian Fielder’s story on how Microsoft helps employees work securely from home using a Zero Trust strategy.]

Managing Zero Trust networking across the enterprise

Mildred Jammer, Zero Trust network principal program manager at Microsoft, acknowledges that the inherent complexity of Microsoft’s operations—there are more than 1 million devices on Microsoft’s network, which supports more than 200,000 employees and partners, which requires a highly strategic planning approach to transition to a Zero Trust environment.

“It’s a huge scope, and we have so many different environments to consider. Unsurprisingly, planning is a top priority for our teams,” says Jammer, whose work centers around ensuring that people and functional groups across Microsoft unite to ensure that Zero Trust Networking initiatives receive the priority that they deserve.

Zero Trust networking goals include reducing risk to Microsoft by requiring devices to authenticate to achieve network access providing a network infrastructure that supports device isolation and segmentation. A third key goal is to devise a system for enhancing response actions if devices are determined to be vulnerable or compromised.

“Zero Trust networking extends beyond the scope of Microsoft networking teams,” Jammer says.

Jammer says that many business groups at Microsoft might not understand what Zero Trust networking is, or they might not consider it as important as other initiatives they’re supporting.

“Zero Trust networking is a huge priority for our teams, but our business groups have their own priorities that don’t account for Zero Trust,” Jammer says. “Neither priority is optional, and they may conflict. We must manage that.”

She says communication being upfront with requirements, and collaborating willingly across Microsoft to ensure everyone’s needs are met.

Jammer says that distilling high-level goals into smaller, more achievable objectives helps employees and partners understand the practicalities of Zero Trust networking so that her teams can establish realistic expectations. “For example, we worked with the security team to break down risk mitigation into specific risks and the outcomes,” she says. “We developed solutions to deliver the outcomes and grouped them when there were commonalities. If business priorities challenged our outcomes, we could break down those groupings, as necessary.”

Jammer cites the deployment of Zero Trust networking as an example, noting that her team initially planned to deploy globally across all wired and wireless networks.

“We planned for a full deployment, but soon learned how disruptive that would be to our developers and infrastructure,” she says. “So, we broke it into chunks, we implemented changes to wireless networks with internet-first posture, and then came back to address our wired networks. To minimize impact and identify best practices, we used flighting deployments with a ring-based approach, starting with a smaller, well-understood population that closely represented our larger target population. As we gained more experience and confidence, we expanded the deployment to reach a larger population.”

Jammer notes that using targeted, achievable goals not only help get work done but also help identify when larger goals might be challenging to accomplish.

“Breaking down large goals into an agile-friendly process was also crucial to demonstrate areas that simply weren’t achievable near term,” Jammer says. “It’s more concrete and actionable to tell someone that we can’t refactor a specific app to be internet-facing than it is to say that we can’t eliminate our corporate intranet infrastructure.”

Making Zero Trust networking a reality

For David Lef, Zero Trust principal IT enterprise architect at Microsoft, implementing Zero Trust networking in a live networking environment carries a significant challenge.

“Reducing risk is a big focus in Zero Trust, but we need to do so with as minimal impact to user experience and productivity as possible,” Lef says. “Our users and employees need this network to perform their job functions. There is a reality that some things have to continue to work in their current state.”

Lef cites a few examples, including printers that didn’t support internet connectivity, IoT devices that required manual configuration, and simple devices that didn’t support Dynamic Host Configuration Protocol (DHCP). “We isolate those on the network and potentially come back to them later while we address projects that are ready to adapt to Zero Trust.”

Lef’s team actively works to establish network access, implement policies, segment networks, and onboard Microsoft business groups, regions, and teams to the Zero Trust networking model. While Zero Trust networking is critical to enabling a Zero Trust model, enterprise-wide collaboration and adoption are equally vital.

“We put a lot of effort into observing activity and talking with our local IT representatives about the details and challenges of each phase of our implementation,” Lef says. “We created our deployment plans so that employees and partners could naturally adopt the new network designs and usage patterns without significant effort on their part.”

For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=OCsTRnAb-pg, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Lef and Suver discuss how Microsoft helps its employees stay productive while working remotely.

Lef notes that his team recommends best practices to partners and suppliers to help build Zero Trust-friendly products and solutions.

“Making legacy technology conform to Zero Trust is difficult. We want to adopt solutions built for Zero Trust networking across our entire enterprise as much as possible. Identities, devices, apps, data, infrastructure—they all contribute to the model, along with networking,” Lef says. “Across the organization, all of these need to be in place for a properly functioning Zero Trust model.”

Thinking about the broader picture

Soumya Subramanian, partner general manager of enterprise infrastructure services at Microsoft, recognized a need to bring multiple workstreams together to accommodate the size and scope of deploying Zero Trust networking.

As organizations consider the scope of what they want to achieve with Zero Trust, they should remember to think about other network modernization initiatives and be intentional in either combining them under the broader program or allowing them to operate independently.

– Soumya Subramanian, partner general manager of Enterprise Infrastructure Services

Soumya Subramanian is a partner general manager of Enterprise Infrastructure Services at Microsoft. (Photo submitted by Soumya Subramanian)

“We already had a workstream in flight to move remaining applications from the corporate network to the cloud” Subramanian says. “We also needed to accelerate our long-term plans for remote connectivity due to the pandemic, which allowed us to reevaluate remote access technologies under the context of Zero Trust. For instance, as you move high-volume applications off the corporate network and onto the cloud, you reduce VPN volumes and usage. You need to consider alternate remote connectivity solutions like Secure Access Service Edge (SASE), virtual desktops, and application proxy services in your Zero Trust networking scope, not just the in-building user experience.”

Subramanian notes that these efforts depend on network automation and data-collection workstreams that many organizations could use to accelerate Zero Trust deployment.

“We started to tie these efforts together so that the network designs and policies we created for Zero Trust could be managed through automation at scale. As a result, we’re more data driven with clear objectives and key results that connect these dependent workstreams.”

“As organizations consider the scope of what they want to achieve with Zero Trust, they should remember to think about other network modernization initiatives and be intentional in either combining them under the broader program or allowing them to operate independently,” Subramanian says.

The post Sharing how Microsoft now secures its network with a Zero Trust model appeared first on Inside Track Blog.

Streamlining Microsoft’s global customer call center system with Microsoft Azure

Cody Bay — Wed, 27 Jan 2021 21:21:15 +0000

Overhauling the call management system Microsoft used to make 70 million calls per year has been a massive undertaking.

The highly complex system was 20 years old and difficult to move on from when, five years ago, the company decided a transformation was needed.

These phone calls are how Microsoft talks to its customers and its partners. We needed to get this right because our call management system is one of the company’s biggest front doors.

– Matt Hayes, principal program manager, OneVoice team

Not only did Microsoft install an entirely new call management system (which is now fully deployed), it did so on next-generation Microsoft Azure infrastructure with global standardization, new capabilities, and enhanced integration for sales and support.

“These phone calls are how Microsoft talks to its customers and its partners,” says Matt Hayes, principal program manager of the OneVoice team. “We needed to get this right because our call management system is one of the company’s biggest front doors.”

Looking back, it was a tall order for Hayes and the OneVoice team, the group in charge of the upgrade at Microsoft Digital, the engineering organization at Microsoft that builds and manages the products, processes, and services that Microsoft runs on.

What made it so tough?

The call management system was made up of 170 different interactive voice response (IVR) systems, which were supported by more than 20 separate phone systems. Those phone systems consisted of 1,600 different phone numbers that were dispersed across 160 countries and regions.

Worst of all, each of these systems was working in isolation.

[This is the second in a series on Microsoft’s call center transformation. The first story in the series documents how Microsoft moved its call centers to Microsoft Azure.]

Kickstarting a transformation

The OneVoice team kicked off Microsoft’s bid to remake its call management system with a complex year-long request for proposal (RFP) process. The team also began preparations with the internal and external stakeholders that it would partner with throughout the upgrade.

To help manage all these workstreams, projects were divvied up into categories that each had their own dedicated team and mandate:

Architecture: This team considered network design and interoperability with the cloud.

Feature needs: This group was charged with ensuring the new system would support business requirements and monitoring needs. They were also tasked with calling out enhancements that should be made to the customer experience.

Partner ecosystem: This team made sure the needs of partners and third-party players were considered and integrated.

Add-on investments: This group made sure cloud space needs were met, addressed personnel gaps, and pursued forward-looking opportunities.

These initial workstreams became the pillars used to guide the transformation of the call management system.

Four pillars of transformation drove the OneVoice team’s call center migration process.

The key to the upgrade was the synergy between the OneVoice team and the call center teams scattered across the company, says Daniel Bauer, senior program manager on the OneVoice team.

“We decided we were going to move to the cloud—after that was approved, we knew it was time to bring in our call center partners and their business representatives,” Bauer says. “That collaboration helped us build a successful solution.”

Early input from these partners guided the architectural design. This enabled the team to bake in features like end-to-end visibility of metrics and telemetry into both first and third-party stacks. It allowed them to manage interconnected voice and data environments across 80 locations. Importantly, it also set early expectations with telecom service providers around who would own specific functions.

Designing for scale by starting small

Bringing myriad systems together under one centralized roof meant the team had to build a system that could handle exponentially greater amounts of data and functionality.

This required a powerful cloud platform that could manage the IVR technology and a call routing system that would appropriately direct millions of calls to the right agent among more than 25,000 customer service representatives.

“Just the scope of that was pretty intense,” says Jon Hoyer, a principal service engineer who led the migration for the OneVoice team.

The strategy, he says, was to take a regional line of business approach. The OneVoice team started the migration in a pilot with a small segment of Microsoft Xbox agents. After the pilot proved successful, the process was scaled out region by region, and in some cases, language by language within those regions.

“There was a lot of coordination around the migration of IVR platforms and call routing logic while keeping it seamless for the customer,” Hoyer says.

Ian McDonnell, a principal PM manager who led the partner onboarding for the OneVoice team, was also faced with the extremely large task of moving all the customer service agents to the new platform.

For many of these partners, this was a wholesale overhaul that involved training tens of thousands of agents and managers on the new cloud solution.

“We were replacing systems at our outsourcers that were integral to how they operated—integral to not only how they could bill their clients, but enabled them to even pay their salaries,” McDonnell says. “We had to negotiate to make sure they were truly bought in, that they not only saw the shared return on investment, but also recognized the new agility and capabilities this platform would bring.”

Build and deploy once, impact everywhere

When a change is made to a system, no one wants to have to make that change again and again.

When we had 20 separate disconnected systems at our outsourcers, it was an absolute nightmare to make that work everywhere. Now we can build it once and deploy that experience across the whole world.

– Ian McDonnell, principal PM manager, OneVoice team

One of the biggest operational efficiencies of the new centralized system is the ability to build new features with universal deployments. If the hold music or a holiday message needs to be changed, rather than updating it on an individual basis to every different phone system, that update goes out to all suppliers at once.

“When we had 20 separate disconnected systems at our outsourcers, it was an absolute nightmare to make that work everywhere,” McDonnell says. “Now we can build it once and deploy that experience across the whole world.”

Previously, there was no option to redirect customers from high- to low-volume call queues, leaving the customer with long waits and negatively impacting their experience. Now, with a single queue, customers are routed to the next available and most appropriate customer service agent in the shortest time, whether the agents sit in the US, India, or the Philippines, providing additional resilience to the service.

This cloud native architecture allowed for new omnichannel features such as “click-to-call,” where customers who are online can request a callback. This allows seamless continuity of context from the secured online experience to a phone conversation for deeper engagement.

As the OneVoice team explores what’s next in add-on investments, they’re exploring a wide range of technologies and capabilities to modernize the call center environment. One of the primary areas of focus is leveraging the speech analytics technology of Microsoft Azure Cognitive Services, which can provide deeper insights into customer satisfaction and sentiment.

In an upcoming blog post in this series, the OneVoice team will share how an in-house development leveraging Microsoft Azure Cognitive Services allowed the team to revolutionize customer sentiment tracking and identify issues before they become major problems.

To contact the OneVoice team, and learn more about their customer support cloud journey, email them at onevoice@microsoft.com.

The post Streamlining Microsoft’s global customer call center system with Microsoft Azure appeared first on Inside Track Blog.

How Microsoft is modernizing its internal network using automation

Aleenah Ansari — Wed, 11 Dec 2019 23:20:08 +0000

After Microsoft moved its workload of 60,000 on-premises servers to Microsoft Azure, employees could set up systems and virtual machines (VMs) with a push of a few buttons.

Although network hardware servers have changed over time, the way that network engineers work isn’t nearly as modern.

“With computers, we have modernized our processes to follow DevOps processes,” says Bart Dworak, a software engineering manager on the Network Automation Delivery Team in Microsoft Digital. “For the most part, those processes did not exist with networking.”

Two years ago, Dworak says, network engineers still created and ran command-line-based scripts and created configuration change reports.

“We would sign into network devices and submit changes using the command line,” Dworak says. “In other, more modern systems, the cloud provides desired-state configurations. We should be able to do the same thing with networks.”

It became clear that Microsoft needed modern technology for configuring and managing the network, especially as the number of managed network devices increased on Microsoft’s corporate network. This increase occurred because of higher network utilization by users, applications, and devices as well as more complex configurations.

“When I started at Microsoft in 2015, our network supported 13,000 managed devices,” Dworak says. “Now, we surpassed 17,000. We’re adding more devices because our users want more bandwidth as they move to the cloud so they can do more things on the network.”

[Learn how Microsoft is using Azure ExpressRoute hybrid technology to secure the company.]

Dworak and the Network Automation Delivery Team saw an opportunity to fill a gap in the company’s legacy network-management toolkit. They decided to apply the concept of infrastructure as code to the domain of networking.

“Network as code provides a means to automate network device configuration and transform our culture,” says Steve Kern, a Microsoft Digital senior program manager and leader of the Network Automation Delivery Team.

The members of the Network Automation Delivery Team knew that implementing the concept of network as code would take time, but they had a clear vision.

“If you’ve worked in a networking organization, change can seem like your enemy,” Kern says. “We wanted to make sure changes were controlled and we had a routine, peer-reviewed rhythm of business that accounted for the changes that were pushed out to devices.”

The team has applied the concept of network as code to automate processes like changing the credentials on more than 17,000 devices at Microsoft, which now occurs in days rather than weeks. The team is also looking into regular telemetry data streaming, which would inform asset and configuration management.

“We want network devices to stream data to us, rather than us collecting data from them,” Dworak says. “That way, we can gain a better understanding of our network with a higher granularity than what is available today.”

The Network Automation Delivery Team has been working on the automation process since 2017. To do this, the team members built a Git repository and started with simple automation to gain momentum. Then, they identified other opportunities to apply the concept of GitOps—a set of practices for deployment, management, and monitoring—to deliver network services to Microsoft employees.

Implementing network as code has led to an estimated savings of 15 years of labor and vendor spending on deployments and network devices changes. As network technology shifts, so does the role of network engineers.

“We’re freeing up network engineers so they can build better, faster, and more reliable networks,” Kern says. “Our aspiration is that network engineers will become network developers who write the code. Many of them are doing that already.”

Additionally, the team is automating how it troubleshoots and responds to outages. If the company’s network event system detects that a wireless access point (AP) is down, it will automatically conduct diagnostics and attempt to address the AP network outage.

“The building AP is restored to service in less time than it would take to wake up a network engineer in the middle of the night, sign in, and troubleshoot and remediate the problem,” Kern says.

Network as code also applies a DevOps mentality to network domain by applying software development and business operations practices to iterate quickly.

“We wanted to bring DevOps principles from the industry and ensure that development and operations teams were one and the same,” Kern says. “If you build something, you own it.”

In the future, the network team hopes to create interfaces for each piece of network gear and have application developers interact with the API during the build process. This would enable the team to run consistent deployments and configurations by restoring a network device entirely from a source-code repository.

Dworak believes that network as code will enable transformation to occur across the company.

“Digital transformation is like remodeling a house. You can remodel your kitchen, living room, and other parts of your house, but first you have to have a solid foundation,” he says. “Your network is part of the foundation—transforming networking will allow others to transform faster.”

Learn how Microsoft is using Azure ExpressRoute hybrid technology to secure the company.

The post How Microsoft is modernizing its internal network using automation appeared first on Inside Track Blog.