{"id":5198,"date":"2023-08-02T09:00:02","date_gmt":"2023-08-02T16:00:02","guid":{"rendered":"https:\/\/www.microsoft.com\/insidetrack\/blog\/?p=5198"},"modified":"2023-08-02T10:18:15","modified_gmt":"2023-08-02T17:18:15","slug":"running-on-vpn-how-microsoft-is-keeping-its-remote-workforce-connected","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/insidetrack\/blog\/running-on-vpn-how-microsoft-is-keeping-its-remote-workforce-connected\/","title":{"rendered":"Running on VPN: How Microsoft is keeping its remote workforce connected"},"content":{"rendered":"
To ensure that employees had a reliable hybrid work experience at the onset of the COVID-19 pandemic, Steve Means, principal cloud network engineering manager in Microsoft Digital Employee Experience, and his team set out to make sure that the company\u2019s internal network would hold up.<\/p>\n
They were cautiously optimistic\u2014the team had just rebuilt the entire network, including the virtual private network (VPN). This network supports access to key internal servers with protected data, personnel information, and other critical assets that must be on lockdown.<\/p>\n
\u201cOur network has done very well for employees to working remotely,\u201d Means says. \u201cSo far, we\u2019ve seen a really strong performance from our network and VPN, specifically.\u201d<\/p>\n
The strong response has been fueled by an earlier decision the team made to reduce the workload that the company pushes through its VPN pipes. The team did that by implementing split tunneling at most of its locations worldwide, which funnels the majority of the company\u2019s mobile workload to the internet.<\/p>\n
Split tunneling became possible because Microsoft is nearly 100 percent in the cloud, which allows its remote workers to access core applications and experiences over the internet via Microsoft Azure and Office 365. Before the company migrated to the cloud, everything would have been routed through VPN.<\/p>\n
\u201cIt really helps us that most of our mobile workload\u2014including traffic to high volume and performance sensitive Office 365 and Azure applications\u2014is securely routed directly over the internet,\u201d Means says.<\/p>\n
In retrospect, adopting split tunneling was a pivotal decision.<\/p>\n
\u201cIt is allowing our employees to maintain their normal level of productivity even as they work remotely,\u201d he says.<\/p>\n
He pointed to how employees are now using Microsoft Teams as an example.<\/p>\n
\u201cOur employees have significantly increased their usage of voice and video conferencing on Teams,\u201d he says. \u201cWe\u2019ve been able to sustain this massive spike in Teams usage without major issues because it\u2019s being routed over the internet\u2014leaving our VPN capacity for just necessary connections between users and our internal resources.\u201d<\/p>\n
There have been challenges, however, which began when the Microsoft\u2019s employees in China started working from home.<\/p>\n
\u201cUnlike here at our headquarters and other worldwide locations, when our employees in China work remotely, everything they do goes exclusively through our VPN pipe,\u201d Means says.<\/p>\n
That meant 100 percent of the workload of employees in Shanghai and Beijing was suddenly going through already heavily used VPN gateways.<\/p>\n
\u201cIt was almost an overnight phenomenon,\u201d Means says. \u201cWe were suddenly seeing usage of 85 to 95 percent of our network bandwidth and our VPN capacity.\u201d<\/p>\n
Already tight before the spread of COVID-19 began, VPN was quickly becoming a bottleneck in China.<\/p>\n
\u201cWe started asking ourselves a lot of questions,\u201d Means says. \u201cCan we handle the expected number of concurrent VPN sessions? How is bandwidth holding up for employees? What\u2019s their experience like? Are they all being successful?\u201d<\/p>\n
Quick action was needed.<\/p>\n
\u201cWe had data to answer all the questions, but what we didn\u2019t have was a single pane of glass where we could quickly look at everything to see what was happening across the company\u2019s infrastructure,\u201d Means says. \u201cAnd company leaders were trying to figure out how to respond to the crisis\u2014they needed data from us, and they needed it quickly.\u201d<\/p>\n
The answer was to identify the data that mattered the most and aggregate it into a Microsoft Power BI dashboard, which the company now uses to track all its VPN systems as the COVID-19 situation evolves.<\/p>\n
As for the offices in Shanghai and Beijing, Means\u2019s team worked with local internet providers to increase VPN capacity by 50 percent so they had enough headroom to handle the new usage safely.<\/p>\n
\u201cThat was a budget decision,\u201d Means says. All they had to do was sign some contracts\u2014no new hardware was needed. \u201cOnce we agreed that it was the right thing to do, we were able to remove that bottleneck in less than a day.\u201d<\/p>\n
[<\/em>Explore using a Zero Trust strategy to secure Microsoft\u2019s network during remote work.<\/em><\/a>\u00a0<\/em>Unpack enhancing VPN performance at Microsoft.<\/em><\/a>\u00a0<\/em>Discover how Microsoft Sentinel protects Microsoft from cybersecurity attacks.<\/em><\/a>]<\/em><\/p>\n The notion of Microsoft\u2019s employees and vendors frequently working remotely was daunting, but Means was confident that its VPN infrastructure would support that sudden spike in demand.<\/p>\n Three years ago, he would not have been so optimistic.<\/p>\n \u201cWe were in a tough spot a few years ago,\u201d Means says. \u201cWe had multiple and complex reasons for why our employees\u2019 end-to-end VPN experience wasn\u2019t very strong\u2014it was a complicated stack that had multiple potential failure points.\u201d<\/p>\n The team ran into issues on the Windows side, there were challenges with the network, and the company was using several different VPN clients at once, which created confusion and complexity for employees. Means\u2019s team worked closely with the Windows team, and through direct partnership and engagement, helped drive significant stability improvements in the Windows native VPN client.<\/p>\n \u201cWe saw a connectivity success rate in the 60 to 65 percent range, which is very low,\u201d Means says. \u201cThat meant that a third of people would run into an issue every time they tried to work remotely.\u201d<\/p>\n A fix was needed.<\/p>\n \u201cWe knew this could become a problem if we had a situation where we needed many of our employees to work remotely,\u201d Means says. \u201cSo, we invested heavily in strengthening our VPN service by focusing on the user experience and partnering closely with internal teams.\u201d<\/p>\n \u201cWe built the new system so it could support over 200,000 concurrent sessions,\u201d Means says. \u201cIn an extreme situation, we could support that many people on VPN at the same time.\u201d<\/p>\n Microsoft has 221,000 employees and a large contingent of vendors who work on the company\u2019s network. They don\u2019t all work at the same time, but the goal was to cover the worst-case scenario and to future-proof the solution.<\/p>\n \u201cAcross the world, we normally have about 55,000 employees connect via VPN on a given day,\u201d Means says. \u201cWith everyone working remotely, that has climbed as high as 128,000 employees and vendors per day, including about 45,000 per day at our headquarters in Redmond.\u201d<\/p>\n Previously, employees used a large number of gateways to access the company\u2019s internal network, but many of those gateways provided poor connectivity.<\/p>\n \u201cWe consolidated the gateways to data centers and locations with reliable and plentiful bandwidth,\u201d Means says. \u201cThis shrunk the number of gateway sites, but increased overall reliability and made it so we could handle more concurrent connections.\u201d<\/p>\n The hybrid design that the team put together uses Microsoft Azure Traffic Manager to geolocate VPN users. \u201cThat allowed us to send them to their nearest gateway and to meet scale demands,\u201d he says. \u201cWe used Azure Active Directory (AAD) to authenticate our users and to validate the status of their device before allowing them on VPN.\u201d<\/p>\n The team also began using servers that can handle 30,000 or 60,000 users each, much more than the old servers that could only handle 750 to 2,000 users. \u201cTheoretically, we could now handle 500,000 concurrent VPN connections worldwide,\u201d Means says.<\/p>\n Means says the improvement in the company\u2019s VPN service was substantial, so much so that employees forgot it was working behind the scenes when they worked remotely.<\/p>\n Despite being worked harder than ever before, the company\u2019s VPN infrastructure is performing at a high level. \u201cKnock on wood, there have been no major incidents,\u201d Means says.<\/p>\n Importantly, VPN is allowing employees to get their work done.<\/p>\n \u201cToday, even as many of our employees work remotely, our success rate is at 92 percent,\u201d Means says. \u201cThat\u2019s one of the highest rates we’ve ever recorded\u2014the only reason it isn\u2019t at 99 percent is because that number includes drops because of reboots during patch updates, getting disconnected from Wi-Fi, and home network or internet service provider issues.\u201d<\/p>\n Employee productivity also has held strong.<\/p>\n \u201cWe measure employee productivity, and the productivity of our software engineers in particular,\u201d Means says. \u201cWe look at pull requests, commits per day, and other indicators\u2014so far, we haven’t seen any measurable drop in work performance.\u201d<\/p>\n Means says the situation is creating a learning moment for his team.<\/p>\n \u201cOne thing that we’re learning is it’s really about the data,\u201d he says. \u201cThere are so many things we can measure\u2014finding the right things to measure so we can take the right actions is critical.\u201d<\/p>\n The team\u2019s data-centric approach to VPN and networking also has allowed it to make smart investments, like provisioning capacity only when required. It also helps the team respond quickly when needed\u2014as was the case when Italy tightened its remote working restrictions.<\/p>\n \u201cWe doubled capacity in London, which is where we run the VPN connection for our employees in Italy,\u201d Means says. \u201cHaving good data allows us to quickly take proactive action when needed and to stay ahead of the game at all times.\u201d<\/p>\n The team also saw the potential for a bottleneck at its headquarters in Redmond, Washington, where the number of concurrent sessions that VPN needed to support was climbing close to capacity. The company addressed this concern by adding another VPN gateway.<\/p>\n \u201cThis has caused us to reflect on our readiness efforts overall,\u201d Means says. \u201cWe\u2019ve used this as an opportunity to improve how we do things.\u201d<\/p>\n The team expects to keep learning and adding to the VPN capabilities.<\/p>\n <\/p>\n For enterprises and organizations looking to optimize and scale out their VPN capabilities, some of the best practices shown above and recommended by Microsoft are:<\/p>\n Finally, and probably most important, know the limits of your VPN connection infrastructure and how to scale out in times of need. Things like total bandwidth possible, maximum concurrent user connections per device will determine when you\u2019ll need to add more VPN devices.<\/p>\n If your devices are physical hardware having additional supply on-hand or a rapid supply chain source will be critical. For cloud solutions, knowing ahead of time how and when to scale will make the difference.<\/p>\n Azure offers a native highly-scalable VPN gateway, as well the most common third-party VPN and SDWAN network virtual appliances in the Azure Marketplace<\/a>.<\/p>\n For more information on these and other Azure and Office network optimizing practices please see:<\/p>\n <\/p>\n To ensure that employees had a reliable hybrid work experience at the onset of the COVID-19 pandemic, Steve Means, principal cloud network engineering manager in Microsoft Digital Employee Experience, and his team set out to make sure that the company\u2019s internal network would hold up. They were cautiously optimistic\u2014the team had just rebuilt the entire […]<\/p>\n","protected":false},"author":21,"featured_media":11948,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"_hide_featured_on_single":false,"_show_featured_caption_on_single":true,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[1],"tags":[239,95,543,430],"coauthors":[138],"class_list":["post-5198","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-network","tag-security","tag-split-tunneling","tag-vpn","program-microsoft-digital-perspectives","m-blog-post"],"jetpack_publicize_connections":[],"yoast_head":"\nInvestments in VPN infrastructure paying off<\/h2>\n
Tips for retooling VPN at your company<\/h2>\n
\n
\n
\n