“How do I get on the Wi-Fi?”
It’s often one of the first questions visitors ask when arriving at a new location, and here at Microsoft, it happens every day.
Our guest Wi-Fi network is typically one of the first services used by visitors at Microsoft. We’re continually evaluating and improving guest Wi-Fi services to increase service resiliency, simplify the registration process, and create seamless connectivity for various device types.
Our network access control (NAC) solution authenticates devices that connect to the guest Wi-Fi service and ensures that these devices are placed in the appropriate network location. Every device—corporate, guest, or Internet of Things (IoT)—must use the NAC solution to gain access to the network.
Like many organizations, our NAC solution has been hosted by on-premises components and infrastructure for years. However, with the end of support looming for our previous NAC solution, we’ve recently migrated NAC for our guest Wi-Fi to the cloud on Microsoft Azure.
Our previous solution presented several challenges to running a robust, resilient, and highly available guest Wi-Fi service that continuously met the requirements and demands of our guest users. Those challenges included:
- Costly infrastructure: All hardware fails eventually. These failures forced ongoing maintenance, and our design teams needed to deploy redundant components to ensure service continuity and adequate performance. The effort to perform maintenance—including returning and ordering hardware, implementing failover protocols, and tracking processes—was expensive.
- Lack of agility: Planned work windows on upstream infrastructure sometimes required us to implement temporary changes or workarounds to keep the service running. This restricted the time when partner teams could perform upgrades or make configuration changes, slowing everyone down.
- Lack of regional resiliency: We needed to schedule software upgrades per region without a cross-regional load-balancing solution for the Wi-Fi registration portal. This required coordination with multiple partner teams around the world.
- Lack of scalability: The high cost of redundancy for physical infrastructure made it too expensive to maintain a scalable solution across all regions. Hosting a NAC solution for every office or data center with Wi-Fi was cost-prohibitive, so redundancy and scalability were problems for some regions.
Hosting NAC services in Microsoft Azure
We assessed the challenges alongside potential solutions for a new NAC solution and decided to host our NAC services in Microsoft Azure.
NAC services are, ultimately, software-defined networking, and we took the opportunity to distribute our NAC services in Azure, where cloud networking and infrastructure as code allow us to radically improve scalability, resiliency, and agility for our NAC services. We’re hosting NAC services across four Azure global regions for redundancy, high availability, and performance purposes.
We’re hosting NAC services in pre-configured images from the Azure Marketplace. These images can be deployed, redeployed, or upgraded using Azure Resource Manager (ARM) templates, significantly reducing maintenance efforts. The images are also hosted in natively resilient Azure virtual machine architecture, creating cost-effective scalability and redundancy.
Azure ExpressRoute connects our on-premises Wi-Fi networks to the cloud-hosted NAC services in Azure. Using the Azure backbone network, we still achieve high performance and low-latency connectivity between our Wi-Fi networks and the cloud-hosted NAC services. We’re using redundant configuration and peering for our ExpressRoute connections, creating instance scalability and resiliency benefits. We can easily perform maintenance on any on-premises router or ExpressRoute configurations without impacting service availability.
Azure Traffic Manager allows us to standardize our guest services across hundreds of Wi-Fi controllers globally with a single, captive portal URL that ensures all users reach the nearest NAC instance in Azure. Azure Traffic Manager’s built-in redundancy ensures that traffic is re-routed seamlessly to a secondary Azure region if a regional outage occurs.
Azure Load Balancer handles the distribution and monitoring of NAC RADIUS traffic across multiple NAC virtual machine instances in a region. Azure Load Balancer monitors NAC component health and ensures that RADIUS requests are routed only to the NAC virtual machines ready to receive incoming requests. Defining a health check on the upstream load balancer removes the requirement for network devices to retry and failover individually, reducing end users’ authentication time.
Azure Backup allows us to seamlessly manage business continuity and disaster recovery for each NAC virtual machine. We use virtual hard disk snapshots to capture the virtual machine state and enable rapid, integrated recovery for any NAC virtual machine, regardless of region.
Azure Serial Console provides a reliable connection to the Azure command-line interface for each virtual machine, even when network configuration or issues prevent connection using the IP address.
Transitioning NAC services seamlessly to the cloud
Our migration process required extensive planning to ensure our NAC services remained available throughout the migration process. We operated the previous on-premises NAC service and the new cloud-based NAC service in Azure in parallel and migrated individual buildings over in a phased approach. This allowed us to review telemetry and collect feedback from a subset of migrated sites before progressing to the next set of buildings, reducing the risk of widespread impact if we experienced an unforeseen issue or anomaly.
Developing quality network automation was critical to this project’s success. We created an Ansible playbook that evaluated the on-premises service configuration on 900 wireless controllers. The playbook only applied the updated configuration if the on-premises configuration was standard. This helped us avoid negative impact on sites that had custom, site-specific guest service configurations. We also used this playbook to validate RADIUS connectivity to the new NAC servers before applying the final configuration to ensure all access control lists were correctly defined for each site.
Looking forward
Migrating our NAC infrastructure to Azure has significantly improved the resiliency and manageability of the guest Wi-Fi service at Microsoft. We can now focus our time on developing and enhancing the underlying NAC service on Azure.
Azure allows scaling NAC services up or down without downtime and planning for hardware considerations and issues. With Azure, we have increased telemetry and disaster recovery capabilities for quicker incident detection and remediation.
We’ve transitioned our NAC services out of the datacenter, removing the need for additional hardware and extensive maintenance efforts, and our migration has been entirely transparent for users.
Our Azure-hosted NAC services have already saved us hundreds of engineering hours, allowing us to focus more on providing an excellent client experience on our guest and corporate networks. We’ll continue to improve NAC and guest Wi-Fi services at Microsoft, ensuring guests visiting Microsoft have access to a reliable and seamless Wi-Fi network in every Microsoft location across the globe.
Consider the following takeaways when assessing your organization’s potential for migrating network services to Microsoft Azure.
- Embrace the cloud for NAC: Move Network Access Control (NAC) services to the cloud with Microsoft Azure to improve scalability, resiliency, and agility while reducing maintenance efforts and costs.
- Simplify with software-defined networking: Transitioning to software-defined networking in Azure enables you to use pre-configured images and templates, enhancing redundancy and performance across global regions.
- Enhance connectivity with Azure ExpressRoute: Use Azure ExpressRoute to connect on-premises networks to cloud-hosted services seamlessly, ensuring high performance and low latency without compromising service availability during maintenance.
- Optimize with Azure Load Balancer and Backup: Azure Load Balancer ensures optimal distribution and health of Azure services, while Azure Backup offers robust business continuity and disaster recovery options.
Learn how to create a public load balancer to load balance VMs using an ARM template.
- Learn more about moving our network to the cloud.
- Unpack transforming Microsoft’s enterprise network with next-generation connectivity.
- Discover how we’re moving Microsoft’s global network to the cloud with Microsoft Azure.
Want more information? Email us and include a link to this story and we’ll get back to you.