Azure Identity and Security Archives - Inside Track Blog

Enhancing VPN performance at Microsoft

Inside Track staff — Sun, 26 Jan 2025 17:00:13 +0000

[Editor’s note: This content was written to highlight a particular event or moment in time. Although that moment has passed, we’re republishing it here so you can see what our thinking and experience was like at the time.]

Modern workers are increasingly mobile and require the flexibility to get work done outside of the office. Here at Microsoft headquarters in the Puget Sound area of Washington State, every weekday an average of 45,000 to 55,000 Microsoft employees use a virtual private network (VPN) connection to remotely connect to the corporate network. As part of our overall Zero Trust Strategy, we have redesigned our VPN infrastructure, something that has simplified our design and let us consolidate our access points. This has enabled us to increase capacity and reliability, while also reducing reliance on VPN by moving services and applications to the cloud.

Providing a seamless remote access experience

Remote access at Microsoft is reliant on the VPN client, our VPN infrastructure, and public cloud services. We have had several iterative designs of the VPN service inside Microsoft. Regional weather events in the past required large increases in employees working from home, heavily taxing the VPN infrastructure and requiring a completely new design. Three years ago, we built an entirely new VPN infrastructure, a hybrid design, using Microsoft Azure Active Directory (Azure AD) load balancing and identity services with gateway appliances across our global sites.

Key to our success in the remote access experience was our decision to deploy a split-tunneled configuration for the majority of employees. We have migrated nearly 100% of previously on-premises resources into Microsoft Azure and Microsoft Office 365. Our continued efforts in application modernization are reducing the traffic on our private corporate networks as cloud-native architectures allow direct internet connections. The shift to internet-accessable applications and a split-tunneled VPN design has dramatically reduced the load on VPN servers in most areas of the world.

Using VPN profiles to improve the user experience

We use Microsoft Endpoint Manager to manage our domain-joined and Microsoft Azure AD–joined computers and mobile devices that have enrolled in the service. In our configuration, VPN profiles are replicated through Microsoft Intune and applied to enrolled devices; these include certificate issuance that we create in Configuration Manager for Windows 10 devices. We support Mac and Linux device VPN connectivity with a third-party client using SAML-based authentication.

We use certificate-based authentication (public key infrastructure, or PKI) and multi‑factor authentication solutions. When employees first use the Auto-On VPN connection profile, they are prompted to authenticate strongly. Our VPN infrastructure supports Windows Hello for Business and Multi-Factor Authentication. It stores a cryptographically protected certificate upon successful authentication that allows for either persistent or automatic connection.

For more information about how we use Microsoft Intune and Endpoint Manager as part of our device management strategy, see Managing Windows 10 devices with Microsoft Intune.

Configuring and installing VPN connection profiles

We created VPN profiles that contain all the information a device requires to connect to the corporate network, including the supported authentication methods and the VPN gateways that the device should connect to. We created the connection profiles for domain-joined and Microsoft Intune–managed devices using Microsoft Endpoint Manager.

For more information about creating VPN profiles, see VPN profiles in Configuration Manager and How to Create VPN Profiles in Configuration Manager.

The Microsoft Intune custom profile for Intune-managed devices uses Open Mobile Alliance Uniform Resource Identifier (OMA-URI) settings with XML data type, as illustrated below.

Creating a Profile XML and editing the OMA-URI settings to create a connection profile in System Center Configuration Manager.

Installing the VPN connection profile

The VPN connection profile is installed using a script on domain-joined computers running Windows 10, through a policy in Endpoint Manager.

For more information about how we use Microsoft Intune as part of our mobile device management strategy, see Mobile device management at Microsoft.

Conditional Access

We use an optional feature that checks the device health and corporate policies before allowing it to connect. Conditional Access is supported with connection profiles, and we’ve started using this feature in our environment.

Rather than just relying on the managed device certificate for a “pass” or “fail” for VPN connection, Conditional Access places machines in a quarantined state while checking for the latest required security updates and antivirus definitions to help ensure that the system isn’t introducing risk. On every connection attempt, the system health check looks for a certificate that the device is still compliant with corporate policy.

Certificate and device enrollment

We use an Azure AD certificate for single sign-on to the VPN connection profile. And we currently use Simple Certificate Enrollment Protocol (SCEP) and Network Device Enrollment Service (NDES) to deploy certificates to our mobile devices via Microsoft Endpoint Manager. The SCEP certificate we use is for wireless and VPN. NDES allows software on routers and other network devices running without domain credentials to obtain certificates based on the SCEP.

NDES performs the following functions:

It generates and provides one-time enrollment passwords to administrators.
It submits enrollment requests to the certificate authority (CA).
It retrieves enrolled certificates from the CA and forwards them to the network device.

For more information about deploying NDES, including best practices, see Securing and Hardening Network Device Enrollment Service for Microsoft Intune and System Center Configuration Manager.

VPN client connection flow

The diagram below illustrates the VPN client-side connection flow.

The client-side VPN connection flow.

When a device-compliance–enabled VPN connection profile is triggered (either manually or automatically):

The VPN client calls into the Windows 10 Azure AD Token Broker on the local device and identifies itself as a VPN client.
The Azure AD Token Broker authenticates to Azure AD and provides it with information about the device trying to connect. A device check is performed by Azure AD to determine whether the device complies with our VPN policies.
If the device is compliant, Azure AD requests a short-lived certificate. If the device isn’t compliant, we perform remediation steps.
Azure AD pushes down a short-lived certificate to the Certificate Store via the Token Broker. The Token Broker then returns control back over to the VPN client for further connection processing.
The VPN client uses the Azure AD–issued certificate to authenticate with the VPN gateway.

Remote access infrastructure

At Microsoft, we have designed and deployed a hybrid infrastructure to provide remote access for all the supported operating systems—using Azure for load balancing and identity services and specialized VPN appliances. We had several considerations when designing the platform:

Redundancy. The service needed to be highly resilient so that it could continue to operate if a single appliance, site, or even large region failed.
Capacity. As a worldwide service meant to be used by the entire company and to handle the expected growth of VPN, the solution had to be sized with enough capacity to handle 200,000 concurrent VPN sessions.
Homogenized site configuration. A standard hardware and configuration stamp was a necessity both for initial deployment and operational simplicity.
Central management and monitoring. We ensured end-to-end visibility through centralized data stores and reporting.
Azure AD–based authentication. We moved away from on-premises Active Directory and used Azure AD to authenticate and authorize users.
Multi-device support. We had to build a service that could be used by as much of the ecosystem as possible, including Windows, OSX, Linux, and appliances.
Automation. Being able to programmatically administer the service was critical. It needed to work with existing automation and monitoring tools.

When we were designing the VPN topology, we considered the location of the resources that employees were accessing when they were connected to the corporate network. If most of the connections from employees at a remote site were to resources located in central datacenters, more consideration was given to bandwidth availability and connection health between that remote site and the destination. In some cases, additional network bandwidth infrastructure has been deployed as needed. The illustration below provides an overview of our remote access infrastructure.

Microsoft remote access infrastructure.

VPN tunnel types

Our VPN solution provides network transport over Secure Sockets Layer (SSL). The VPN appliances force Transport Layer Security (TLS) 1.2 for SSL session initiation, and the strongest possible cipher suite negotiated is used for the VPN tunnel encryption. We use several tunnel configurations depending on the locations of users and level of security needed.

Split tunneling

Split tunneling allows only the traffic destined for the Microsoft corporate network to be routed through the VPN tunnel, and all internet traffic goes directly through the internet without traversing the VPN tunnel or infrastructure. Our migration to Office 365 and Azure has dramatically reduced the need for connections to the corporate network. We rely on the security controls of applications hosted in Azure and services of Office 365 to help secure this traffic. For end point protection, we use Microsoft Defender Advanced Threat Protection on all clients. In our VPN connection profile, split tunneling is enabled by default and used by the majority of Microsoft employees. Learn more about Office 365 split tunnel configuration.

Full tunneling

Full tunneling routes and encrypts all traffic through the VPN. There are some countries and business requirements that make full tunneling necessary. This is accomplished by running a distinct VPN configuration on the same infrastructure as the rest of the VPN service. A separate VPN profile is pushed to the clients who require it, and this profile points to the full-tunnel gateways.

Full tunnel with high security

Our IT employees and some developers access company infrastructure or extremely sensitive data. These users are given Privileged Access Workstations, which are secured, limited, and connect to a separate highly controlled infrastructure.

Applying and enforcing policies

In Microsoft Digital, the Conditional Access administrator is responsible for defining the VPN Compliance Policy for domain-joined Windows 10 desktops, including enterprise laptops and tablets, within the Microsoft Azure Portal administrative experience. This policy is then published so that the enforcement of the applied policy can be managed through Microsoft Endpoint Manager. Microsoft Endpoint Manager provides policy enforcement, as well as certificate enrollment and deployment, on behalf of the client device.

For more information about policies, see VPN and Conditional Access.

Early adopters help validate new policies

With every new Windows 10 update, we rolled out a pre-release version to a group of about 15,000 early adopters a few months before its release. Early adopters validated the new credential functionality and used remote access connection scenarios to provide valuable feedback that we could take back to the product development team. Using early adopters helped validate and improve features and functionality, influenced how we prepared for the broader deployment across Microsoft, and helped us prepare support channels for the types of issues that employees might experience.

Measuring service health

We measure many aspects of the VPN service and report on the number of unique users that connect every month, the number of daily users, and the duration of connections. We have invested heavily in telemetry and automation throughout the Microsoft network environment. Telemetry allows for data-driven decisions in making infrastructure investments and identifying potential bandwidth issues ahead of saturation.

Using Power BI to customize operational insight dashboards

Our service health reporting is centralized using Power BI dashboards to display consolidated data views of VPN performance. Data is aggregated into an SQL Azure data warehouse from VPN appliance logging, network device telemetry, and anonymized device performance data. These dashboards, shown in the next two graphics below, are tailored for the teams using them.

Global VPN status dashboard.

Microsoft Power BI reporting dashboards.

With our optimizations in VPN connection profiles and improvements in the infrastructure, we have seen significant benefits:

Reduced VPN requirements. By moving to cloud-based services and applications and implementing split tunneling configurations, we have dramatically reduced our reliance on VPN connections for many users at Microsoft.
Auto-connection for improved user experience. The VPN connection profile automatically configured for connection and authentication types have improved mobile productivity. They also improve the user experience by providing employees the option to stay connected to VPN—without additional interaction after signing in.
Increased capacity and reliability. Reducing the quantity of VPN sites and investing in dedicated VPN hardware has increased our capacity and reliability, now supporting over 500,000 simultaneous connections.
Service health visibility. By aggregating data sources and building a single pane of glass in Microsoft Power BI, we have visibility into every aspect of the VPN experience.

The post Enhancing VPN performance at Microsoft appeared first on Inside Track Blog.

Moving to next-generation SIEM at Microsoft with Microsoft Sentinel

Inside Track staff — Thu, 09 Jan 2025 15:05:50 +0000

Our internal security team works diligently 24 hours a day, 7 days a week to help protect Microsoft IP, its employees, and its overall business health from security threats.

We recently implemented Microsoft Sentinel to replace a preexisting, on-premises solution for security information and event management (SIEM). With Microsoft Sentinel, we can ingest and appropriately respond to more than 20 billion cybersecurity events per day.

Microsoft Sentinel supplies cloud-scale SIEM functionality that allows integration with crucial systems, provides accurate and timely response to security threats, and supports the SIEM requirements of our team.

Our team is responsible for maintaining security and compliance standards across Microsoft. Managing the massive volume of incoming security-related data is critical to Microsoft’s business health. Historically, we have performed SIEM using a third-party tool hosted on-premises in Microsoft datacenters.

However, we recognized several areas in which they could improve their service by implementing a next-generation SIEM tool. Some of the challenges when using the old tool included:

Limited ability to accommodate increasing incoming traffic. Ingesting data into the previous SIEM tool was time consuming due to limited ingestion processes. As the number of incoming cybersecurity events continued to grow, it became more evident that the solution we were using wouldn’t be able to maintain the necessary throughput for data ingestion.
On-premises scalability and agility issues. The previous solution’s on-premises nature limited our ability to scale effectively and respond to changing business and security requirements at the speed that we required.
Increased training requirements. We needed to invest more resources in training and onboarding with the previous solution, because it was on-premises and customized to meet our requirements. If we recruited employees from outside Microsoft, they needed to learn the new solution—including its complex on-premises architecture—from the ground up.

As part of our ongoing digital transformation, we’re moving to cloud-based solutions with proven track records and active, customer-facing development and involvement. We need our technology stack to evolve at the speed of our business.

Modernizing SIEM with Microsoft Sentinel

In response to the challenges presented, we began assessing options for a new SIEM environment that would address the challenges positioning our team to manage continued growth of the cybersecurity landscape.

Feature assessment and planning

In partnership with the Microsoft Sentinel product team, our internal security division assessed whether Sentinel would be a suitable replacement for our previous solution. Sentinel is a Microsoft-developed, cloud-native enterprise SIEM solution that uses the cloud’s agility and scalability to ensure rapid threat detection and response through:

Elastic scaling.
AI–infused detection capability.
A broad set of out-of-the-box data connectivity and ingestion solutions.

To move to Microsoft Sentinel, we needed to verify that equivalent features and capabilities were available in the new environment. We aligned security teams across Microsoft to ensure that we met all requirements. Some of these teams had mature monitoring and detection definitions in place, and we needed to understand those scenarios to accommodate feature-performance requirements. The issues that our previous solution presented narrowed our focus with respect to whether Sentinel would work, including throughput, agility, and usability.

Throughout the assessment period and into migration, we worked closely with the Microsoft Sentinel product team to ensure that Microsoft Sentinel could provide the feature set we required. Our engagement with the Microsoft Sentinel team addressed two sets of needs simultaneously. We received significant incident-response benefits from Microsoft Sentinel while the product team worked with us as if we were a customer. This close collaboration meant that the product team could identify what enterprise-scale customers needed more quickly.

Not only were our requirements met, but we were able to provide feedback and testing for the Microsoft Sentinel product team. This helped them better serve their large customers that have similar challenges, requirements, and needs.

Defining and refining SIEM detections

As we developed standards that met our new requirements, we also evaluated our previous SIEM solution’s functionality to determine how it would transition to Microsoft Sentinel. We examined three key aspects of incoming security data ingestion and event detection:

Data-source validity. We pull incoming SIEM data from hundreds of data locations across Microsoft. As time has passed, some of these data sources remained valid but others no longer provided relevant SIEM data. We assessed our entire data-source footprint to determine which data sources Microsoft Sentinel should ingest and which ones were no longer required. This process helped us to better understand our data-source environment and refine the amount of data ingested. There were several data sources that we weren’t ingesting with the previous solution because of performance limitations. We knew that we wanted to increase ingestion capability when moving to Microsoft Sentinel.
Detection importance. Our team examined event-detection definitions used throughout the previous SIEM solution, so we could understand how detections were being performed, which detection definitions generated alerts, and the volume of alerts from each detection. This information helped us identify the most important detection definitions, so we could prioritize these definitions in the migration process.
Detection validity. Our security teams evaluated the list of detections from our SIEM environment so we could identify invalid detections or detection definitions that required refinement. This helped us create a more streamlined set of detections when moving into Microsoft Sentinel, including combining multiple detection definitions and removing several detections.

Throughout this process, we worked with the Microsoft Security Operations team to evaluate detections end-to-end. They got involved in the detection and data-source refinement process and were exposed to how these detections and data sources would work in Microsoft Sentinel.

Implementation

After feature parity and throughput capabilities were confirmed, we began the migration process from our previous solution to Microsoft Sentinel. Based on our initial testing, we added several implementation steps to ensure that our Sentinel environment would readily meet our security environment’s needs.

Onboarding data sources

Properly onboarding data sources was a critical component in our implementation and one of the biggest benefits of the Microsoft Sentinel environment. With the massive amount of default connectors available in Sentinel, we were able to connect to most of our data sources without further customization. This included cloud data sources such as Microsoft Azure Active Directory, Microsoft Defender for Cloud, and Microsoft Defender. However, it also included on-premises data sources, such as Windows Events and firewall systems.

We also connected to several enrichment sources that supplied more information for threat-hunting queries and detections. These enrichments sources included data from human-resources systems and other nontypical data sources. We used playbooks to create many of these connections.

We keep Microsoft Sentinel data in hot storage for 90 days, using Kusto Query Language (KQL) queries for detections, hunting, and investigation. We also use Microsoft Azure Data Explorer for warm storage and Microsoft Azure Data Lake for cold storage and retrieval for up to two years.

Refining detections

Based on testing, we refined our detection definitions further in Sentinel to support better alert suppression and aggregation. We didn’t want to overwhelm our Security Operations team with incidents. Therefore, we refined our detection definitions to include suppression logic when notification wasn’t required and aggregation logic to ensure that similar and related events were grouped together and not surfaced as multiple, individual alerts.

Increasing scale with the cloud

We used dedicated clusters for Microsoft Azure Monitor Log Analytics to support the data-ingestion scalability we required. At a large enterprise scale, our previous solution was exceeding its capacity at 10 billion events per day. With dedicated clusters, we were able to accommodate that initial volume and add additional data sources to improve alert detection, thereby increasing our event ingestion to > 20 billion events per day.

Customizing functionality

Our environment required several customizations to Sentinel functionality, which we implemented by using standard Microsoft Sentinel features and extension capabilities to meet our needs while still staying within the boundaries of standard functionality. Using common features for customization made our changes to Sentinel easy to document and helped our security operations team better and more quickly understand and use the new features. We made several important customizations including:

Integration with our IT service-management system. We integrated Microsoft Sentinel with our security incident management solution. This had a two-fold positive effect, as it extended Sentinel information into our case-management environment and provided our support teams with exactly the information they need, regardless of which tool they’re in.
Implementation of Microsoft Defender for Cloud playbook to support scale. We used a playbook to automate the addition of more than 20,000 Azure subscriptions to Microsoft Defender for Cloud.
High volume ingestion with Microsoft Azure Event Hub and Microsoft Azure Virtual Machine scales sets. We built a custom solution that ingested the large volume of events from our firewall systems that exceeded the capabilities of on-premises collection agents. With the new solution, we can ingest more than 100,000 events per second into Microsoft Sentinel from on-premises firewalls.

Architecture for the new SIEM solution using Microsoft Sentinel.

We’ve experienced several important benefits from using Microsoft Sentinel as our SIEM tool, including:

Faster query performance. Our query speed with Microsoft Sentinel improved drastically. It’s 12 times faster than it was with the previous solution, on average, and is up to 100 times faster with some queries.
Simplified training and onboarding. Using a cloud-based, commercially available solution like Microsoft Sentinel means it’s much simpler to onboard and train employees. Our security engineers don’t need to understand the complexities of an underlying on-premises architecture. They simply start using Sentinel for security management.
Greater feature agility. Microsoft Sentinel’s feature set and capabilities iterate at a much faster rate than we could maintain with our on-premises developed solution.
Improved data ingestion. Microsoft Sentinel’s out-of-the box connectors and integration with the Microsoft Azure platform make it much easier to include data from anywhere and extend Sentinel functionality to integrate with other enterprise tools. On average, it’s 18 times faster to ingest data into Sentinel using a built-in data connector than it was with our previous solution.

Throughout our Microsoft Sentinel implementation, we reexamined and refined our approach to SIEM. At Microsoft’s scale, very few implementations go exactly as planned from beginning to end. However, we derived several points with our Sentinel implementation, including:

More testing enables more refinement. We tested our detections, data sources, and processes extensively. The more we tested, the better we understood how we could improve test results. This, in turn, meant more opportunities to refine our approach.
Customization is necessary but achievable. We capitalized on the flexibility of Microsoft Sentinel and the Microsoft Azure platform often during our implementation. We found that while out-of-the-box features didn’t meet all our requirements, we were able to create customizations and integrations to meet the needs of our security environment.
Large enterprise customers might require a dedicated cluster. We used dedicated Log Analytics clusters to allow ingestion of nearly 20 billion events per day. In other large enterprise scenarios, moving from a shared cluster to a dedicated cluster might be necessary for adequate performance.

The first phase of our migration is complete! However, there’s still more to discover with Microsoft Sentinel. We’re taking advantage of new ways to engage and interact with connected datasets and using machine learning to manage some of our most complex detections. As we continue to grow our SIEM environment in Sentinel, we’re capitalizing on Sentinel’s cloud-based benefits to help meet our security needs at an enterprise level. Sentinel provides our security operations teams with a single SIEM solution that has all the tools they need to successfully complete and manage security events and investigations.

Want more information? Email us and include a link to this story and we’ll get back to you.
Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Moving to next-generation SIEM at Microsoft with Microsoft Sentinel appeared first on Inside Track Blog.

Using a Zero Trust strategy to secure Microsoft’s network during remote work

Aleenah Ansari — Fri, 03 Jan 2025 14:59:49 +0000

Microsoft’s cloud-first strategy enables most Microsoft employees to directly access applications and services via the internet, but remote workers still use the company’s virtual private network (VPN) to access some corporate resources and applications when they’re outside of the office.

This became increasingly apparent when Microsoft prepared for its employees to work remotely in response to the global pandemic. VPN usage increased by 70 percent, which coincides with the significant spike in users working from home daily.

So then, how is Microsoft ensuring that its employees can securely access the applications they need?

With split tunneling and a Zero Trust security strategy.

As part of the company’s Zero Trust security strategy, employees in Microsoft Digital redesigned the VPN infrastructure by adopting a split-tunneled configuration that further enables the company’s workloads moving to the cloud.

“Adopting split tunneling has ensured that Microsoft employees can access core applications over the internet using Microsoft Azure and Microsoft Office 365,” says Steve Means, a principal cloud network engineering manager in Microsoft Digital. “This takes pressure off the VPN and gives employees more bandwidth to do their job securely.”

Eighty percent of remote working traffic flows to cloud endpoints where split tunneling is enabled, but the rest of the work that employees do remotely—which needs to be locked down on the corporate network—still goes through the company’s VPN.

“We need to make sure our VPN infrastructure has the same level of corporate network security as applications in the cloud,” says Carmichael Patton, a principal security architect on Microsoft’s Digital Security and Resilience team. “We’re applying the same Zero Trust principles to our VPN traffic, by applying conditional access to each connection.”

[Learn how Microsoft rebuilt its VPN infrastructure. Learn how Microsoft transitioned to modern access architecture with Zero Trust. Read how Microsoft is approaching Zero Trust Networking.]
For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=bleFoL0NkVM, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Experts from Microsoft Digital answer frequently asked questions around how VPN, modern device management, and Zero Trust come together to deliver a world class remote work platform.

Securing remote workers with device management and conditional access

Moving most of the work that employees require to the cloud only became possible after the company adopted modern security controls that focus on securing devices.

“We no longer rely solely on the network to manage firewalls,” Patton says. “Instead, each application that an employee uses enforces its own security management—this means employees can only use an app after it verifies the health of their device.”

To support this transformed approach to security, Microsoft adopted a Zero Trust security model, which manages risk and secures working remotely by managing the device an employee uses.

“Before an employee can access an application, they must enroll their device, have relevant security policies, and have their device health validated,” Patton says. “This ensures that only registered devices that comply with company security policies can access corporate resources, which reduces the risk of malware and intruders.”

The team also recommends using a dynamic and scalable authentication mechanism, like Azure Active Directory, to avoid the trouble of certificates.

While most employees rely on our standard VPN infrastructure, Microsoft has specific scenarios that call for additional security when accessing company infrastructure or sensitive data. This is the case for Microsoft Digital employees in owner and contributor roles that are configured on a Microsoft Azure subscription as well as employees who make changes to customer-facing production services and systems like firewalls and network gear. To access corporate resources, these employees use Privileged Access Workstations, a dedicated operating system for sensitive tasks, to access a highly secure VPN infrastructure.

Phil Suver, a principal PM manager in Microsoft Digital, says working remotely during the global pandemic gives employees a sense of what the Zero Trust experience will be like when they return to the office.

“Hardened local area networks that previously accessed internal applications are a model of the past,” Suver says. “We see split tunneling as a gateway to prepare our workforce for our Zero Trust Networking posture, where user devices are highly protected from vulnerability and employees use the internet for their predominant workload.”

It’s also important to review your VPN structure for updates.

“When evaluating your VPN configuration, identify the highest compliance risks to your organization and make them the priority for controls, policies, and procedures,” Patton says. “Understand the security controls you give up by not flowing the connections through your internal infrastructure. Then, look at the controls you’re able to extend to the clients themselves, and find the right balance of risk and productivity that fits your organization.”

Keeping your devices up-to-date with split tunneling

Enterprises can also optimize patching and manage update compliance using services like Microsoft Endpoint Manager, Microsoft Intune, and Windows Update for Business. At Microsoft, a split-tunneled VPN configuration allows these services to keep devices current without requiring a VPN tunnel to do it.

“With a split-tunneled configuration, update traffic comes through the internet,” says Mike Carlson, a principal service engineering manager in Microsoft Digital. “This improves the user experience for employees by freeing up VPN bandwidth during patch and release cycles.”

At Microsoft, device updates fall into two categories: feature updates and quality updates. Feature updates occur every six months and encompass new operating system features, functionality, and major bug fixes. In contrast, monthly quality updates include security and reliability updates as well as small bug fixes. To balance both user experience and security, Microsoft’s current configuration of Windows Update for Business prompts Microsoft employees to update within 48 hours for quality updates and 7 days for feature updates.

“Not only can Windows Update for Business isolate update traffic from the VPN connection, but it can also provide better compliance management by using the deadline feature to adjust the timing of quality and feature updates,” Carlson says. “We can quickly drive compliance and have more time to focus on employees that may need additional support.”

Evaluating your VPN configuration

When your enterprise evaluates which VPN configuration works best for your company and users, you must evaluate their workflows.

“Some companies may need a full tunnel configuration, and others might want something cloud-based,” Means says. “If you’re a Microsoft customer, you can work with your sales team to request a customer engagement with a Microsoft expert to better understand our implementation and whether it would work for your enterprise.”

Means also said that it’s important to assess the legal requirements of the countries you operate in, which is done at Microsoft using Azure Traffic Manager. For example, split tunneling may not be the right configuration for countries with tighter controls over how traffic flows within and beyond their borders.

Suver also emphasized the importance of understanding the persona of your workforce, suggesting you should assess the workloads they may need to use remotely and their bandwidth capacity. You should also consider the maximum number of concurrent connections your VPN infrastructure supports and think through potential seasonal disruptions.

“Ensure that you’ve built for a snow day or a pandemic of a global nature,” Suver says. “We’ve had to send thousands of customer support agents to work from home. Typically, they didn’t use VPN to have voice conversations with customers. Because we sized and distributed our infrastructure for a global workforce, we were able to quickly adapt to the dramatic shift in workloads that have come from our employees working from home during the pandemic. Anticipate some of the changes in workflow that might occur, and test for those conditions.”

It’s also important to collect user connection and traffic data in a central location for your VPN infrastructure, to use modern visualization services like Microsoft Power BI to identify hot spots before they happen, and to plan for growth.

Means’s biggest piece of advice?

Focus on what your enterprise needs and go from there.

“Identify what you want to access and what you want to protect,” he says. “Then build to that model.”

Tips for retooling VPN at your company

Azure offers a native, highly-scalable VPN gateway, and the most common third-party VPN and Software-Defined Wide Area Network virtual appliances in the Azure Marketplace.

For more information on these and other Azure and Office network optimizing practices, please see:

Here are additional resources to learn more about how Microsoft applies networking best practices and supports a Zero Trust security strategy:

The post Using a Zero Trust strategy to secure Microsoft’s network during remote work appeared first on Inside Track Blog.

Verifying device health at Microsoft with Zero Trust

Inside Track staff — Fri, 06 Sep 2024 13:51:32 +0000

Here at Microsoft, we’re using our Zero Trust security model to help us transform the way we verify device health across all devices that access company resources. Zero Trust supplies an integrated security philosophy and end-to-end strategy that informs how our company protects its customers, data, employees, and business in an increasingly complex and dynamic digital world.

Verified device health is a core pillar of our Microsoft Digital Zero Trust security model. Because unmanaged devices are an easy entry point for bad actors, ensuring that only healthy devices can access corporate applications and data is vital for enterprise security. As a fundamental part of our Zero Trust implementation, we require all user devices accessing corporate resources to be enrolled in device-management systems.

Verified devices support our broader framework for Zero Trust, alongside the other pillars of verified identity, verified access, and verified services.

The four pillars of Microsoft’s Zero Trust model.

[Explore verifying identity in a Zero Trust model. | Unpack implementing a Zero Trust security model at Microsoft. | Discover enabling remote work: Our remote infrastructure design and Zero Trust. | Watch our Enabling remote work infrastructure design using Zero Trust video.]

Verifying the device landscape at Microsoft

The device landscape at Microsoft is characterized by a wide variety of devices. We have more than 220,000 employees and additional vendors and partners, most of whom use multiple devices to connect to our corporate network. We have more than 650,000 unique devices enrolled in our device-management platforms, including devices running Windows, iOS, Android, and macOS. Our employees need to work from anywhere, including customer sites, cafes, and home offices. The transient nature of employee mobility poses challenges to data safety. To combat this, we are implementing device-management functionality to enable the mobile-employee experience—confirming identity and access while ensuring that the devices that access our corporate resources are in a verified healthy state according to the policies that govern safe access to Microsoft data.

Enforcing client device health

Device management is mandatory for any device accessing our corporate data. The Microsoft Endpoint Manager platform enables us to enroll devices, bring them to a managed state, monitor the devices’ health, and enforce compliance against a set of health policies before granting access to any corporate resources. Our device health policies verify all significant aspects of device state, including encryption, antimalware, minimum OS version, hardware configuration, and more. Microsoft Endpoint Manager also supports internet-based device enrollment, which is a requirement for the internet-first network focus in the Zero Trust model.

We’re using Microsoft Endpoint Manager to enforce health compliance across the various health signals and across multiple client device operating systems. Validating client device health is not a onetime process. Our policy-verification processes confirm device health each time a device tries to access corporate resources, much in the same way that we confirm the other pillars, including identity, access, and services. We’re using modern endpoint protection configuration on every managed device, including preboot and postboot protection and cross-platform coverage. Our modern management environment includes several critical components:

Microsoft Azure Active Directory (Azure AD) for core identity and access functionality in Microsoft Intune and the other cloud-based components of our modern management model, including Microsoft Office 365, Microsoft Dynamics 365, and many other Microsoft cloud offerings.
Microsoft Intune for policy-based configuration management, application control, and conditional-access management.
Clearly defined mobile device management (MDM) policy. Policy-based configuration is the primary method for ensuring that devices have the appropriate settings to help keep the enterprise secure and enable productivity-enhancement features.
Windows Update for Business is configured as the default for operating system and application updates for our modern-managed devices.
Microsoft Defender for Endpoint (MDE) is configured to protect our devices, send compliance data to Azure AD Conditional Access, and supply event data to our security teams.
Dynamic device and user targeting for MDM enables us to supply a more flexible and resilient environment for the application of MDM policies. It enables us to flexibly apply policies to devices as they move into different policy scopes.

Providing secure access methods for unmanaged devices

While our primary goal is to have users connect to company resources by using managed devices, we also realize that not every user’s circumstances allow for using a completely managed device. We’re using cloud-based desktop virtualization to provide virtual machine–based access to corporate data through a remote connection experience that enables our employees to connect to the data that they need from anywhere, using any device. Desktop virtualization enables us to supply a preconfigured, compliant operating system and application environment in a pre-deployed virtual machine that can be provisioned on demand.

Additionally, we’ve created a browser-based experience allowing access, with limited functionality, to some Microsoft 365 applications. For example, an employee can open Microsoft Outlook in their browser and read and reply to emails, but they will not be able to open any documents or browse any Microsoft websites without first enrolling their devices into management.

How we treat the devices that our employees and partners use to access corporate data is an integral component of our Zero Trust model. By verifying device health, we extend the enforcement capabilities of Zero Trust. A verified device, associated with a verified identity, has become the core checkpoint across our Zero Trust model. We’re currently working toward achieving better control over administrative permissions on client devices and a more seamless device enrollment and management process for every device, including Linux–based operating systems. As we continue to strengthen our processes for verifying device health, we’re strengthening our entire Zero Trust model.

The post Verifying device health at Microsoft with Zero Trust appeared first on Inside Track Blog.

Implementing a Zero Trust security model at Microsoft

Inside Track staff — Tue, 23 Jul 2024 08:01:02 +0000

At Microsoft, our shift to a Zero Trust security model more than five years ago has helped us navigate many challenges.

The increasing prevalence of cloud-based services, mobile computing, internet of things (IoT), and bring your own device (BYOD) in the workforce have changed the technology landscape for the modern enterprise. Security architectures that rely on network firewalls and virtual private networks (VPNs) to isolate and restrict access to corporate technology resources and services are no longer sufficient for a workforce that regularly requires access to applications and resources that exist beyond traditional corporate network boundaries. The shift to the internet as the network of choice and the continuously evolving threats led us to adopt a Zero Trust security model internally here at Microsoft. Though our journey began many years ago, we expect that it will continue to evolve for years to come.

[Learn how we’re transitioning to modern access architecture with Zero Trust. Find out how to enable a remote workforce by embracing Zero Trust security. Running on VPN: Learn how we’re keeping our remote workforce connected.]
For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=ZVLlEj2So4E, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Carmichael Patton, a security architect at Microsoft, shares the work that his team, Digital Security and Resiliency, has been doing to support a Zero Trust security model.

The Zero Trust model

Based on the principle of verified trust—in order to trust, you must first verify—Zero Trust eliminates the inherent trust that is assumed inside the traditional corporate network. Zero Trust architecture reduces risk across all environments by establishing strong identity verification, validating device compliance prior to granting access, and ensuring least privilege access to only explicitly authorized resources.

Zero Trust requires that every transaction between systems (user identity, device, network, and applications) be validated and proven trustworthy before the transaction can occur. In an ideal Zero Trust environment, the following behaviors are required:

Identities are validated and secure with multifactor authentication (MFA) everywhere. Using multifactor authentication eliminates password expirations and eventually will eliminate passwords. The added use of biometrics ensures strong authentication for user-backed identities.
Devices are managed and validated as healthy. Device health validation is required. All device types and operating systems must meet a required minimum health state as a condition of access to any Microsoft resource.
Telemetry is pervasive. Pervasive data and telemetry are used to understand the current security state, identify gaps in coverage, validate the impact of new controls, and correlate data across all applications and services in the environment. Robust and standardized auditing, monitoring, and telemetry capabilities are core requirements across users, devices, applications, services, and access patterns.
Least privilege access is enforced. Limit access to only the applications, services, and infrastructure required to perform the job function. Access solutions that provide broad access to networks without segmentation or are scoped to specific resources, such as broad access VPN, must be eliminated.

Zero Trust scenarios

We have identified four core scenarios at Microsoft to help achieve Zero Trust. These scenarios satisfy the requirements for strong identity, enrollment in device management and device-health validation, alternative access for unmanaged devices, and validation of application health. The core scenarios are described here:

Scenario 1: Applications and services have the mechanisms to validate multifactor authentication and device health.
Scenario 2: Employees can enroll devices into a modern management system which guarantees the health of the device to control access to company resources.
Scenario 3: Employees and business guests have a method to access corporate resources when not using a managed device.
Scenario 4: Access to resources is limited to the minimum required—least privilege access—to perform a specified function.

Zero Trust scope and phases

We’re taking a structured approach toward Zero Trust, in an effort that spans many technologies and organizations, and requires investments that will carry over multiple years. The figure below represents a high-level view of the Zero Trust goals that we aim to fully achieve over the next two to three years, grouped into our core Zero Trust pillars. We will continually evaluate these goals and adjust them if necessary. While these goals don’t represent the full scope of the Zero Trust efforts and work streams, they capture the most significant areas of Zero Trust effort at Microsoft.

The major goals for each Zero Trust pillar.

Scope

Our initial scope for implementing Zero Trust focused on common corporate services used across our enterprise—our employees, partners, and vendors. Our Zero Trust implementation targeted the core set of applications that Microsoft employees use daily (e.g., Microsoft Office apps, line-of-business apps) on platforms like iOS, Android, MacOS, and Windows (Linux is an eventual goal). As we have progressed, our focus has expanded to include all applications used across Microsoft. Any corporate-owned or personal device that accesses company resources must be managed through our device management systems.

Verify identity

To begin enhancing security for the environment, we implemented MFA using smart cards to control administrative access to servers. We later expanded the multifactor authentication requirement to include all users accessing resources from outside the corporate network. The massive increase in mobile devices connecting to corporate resources pushed us to evolve our multifactor authentication system from physical smart cards to a phone-based challenge (phone-factor) and later into a more modern experience using the Microsoft Azure Authenticator application.

The most recent progress in this area is the widespread deployment of Windows Hello for Business for biometric authentication. While Windows Hello hasn’t completely eliminated passwords in our environment, it has significantly reduced password usage and enabled us to remove our password-expiration policy. Additionally, multifactor authentication validation is required for all accounts, including guest accounts, when accessing Microsoft resources.

Verify device

Our first step toward device verification was enrolling devices into a device-management system. We have since completed the rollout of device management for Windows, Mac, iOS, and Android. Many of our high-traffic applications and services, such as Microsoft 365 and VPN, enforce device health for user access. Additionally, we’ve started using device management to enable proper device health validation, a foundational component that allows us to set and enforce health policies for devices accessing Microsoft resources. We’re using Windows Autopilot for device provisioning, which ensures that all new Windows devices delivered to employees are already enrolled in our modern device management system.

Devices accessing the corporate wireless network must also be enrolled in the device-management system. This includes both Microsoft–owned devices and personal BYOD devices. If employees want to use their personal devices to access Microsoft resources, the devices must be enrolled and adhere to the same device-health policies that govern corporate-owned devices. For devices where enrollment in device management isn’t an option, we’ve created a secure access model called Microsoft Azure Virtual Desktop. Virtual Desktop creates a session with a virtual machine that meets the device-management requirements. This allows individuals using unmanaged devices to securely access select Microsoft resources. Additionally, we’ve created a browser-based experience allowing access to some Microsoft 365 applications with limited functionality.

There is still work remaining within the verify device pillar. We’re in the process of enabling device management for Linux devices and expanding the number of applications enforcing device management to eventually include all applications and services. We’re also expanding the number of resources available when connecting through the Virtual Desktop service. Finally, we’re expanding device-health policies to be more robust and enabling validation across all applications and services.

Verify access

In the verify access pillar, our focus is on segmenting users and devices across purpose-built networks, migrating all Microsoft employees to use the internet as the default network, and automatically routing users and devices to appropriate network segments. We’ve made significant progress in our network-segmentation efforts. We have successfully deployed several network segments, both for users and devices, including the creation of a new internet-default wireless network across all Microsoft buildings. All users have received policy updates to their systems, thus making this internet-based network their new default.

As part of the new wireless network rollout, we also deployed a device-registration portal. This portal allows users to self-identify, register, or modify devices to ensure that the devices connect to the appropriate network segment. Through this portal, users can register guest devices, user devices, and IoT devices.

We’re also creating specialized segments, including purpose-built segments for the various IoT devices and scenarios used throughout the organization. We have nearly completed the migration of our highest-priority IoT devices in Microsoft offices into the appropriate segments.

We still have a lot of work to do within the verify access pillar. We’re following the investments in our wireless networks with similar wired network investments. For IoT, we need to complete the migration of the remaining high-priority devices in Microsoft offices and then start on high-priority devices in our datacenters. After these devices are migrated, we’ll start migrating lower-priority devices. Finally, we’re building auto-detection for devices and users, which will route them to the appropriate segment without requiring registration in the device-registration portal.

Verify services

In the verify services pillar, our efforts center on enabling conditional access across all applications and services. To achieve full conditional access validation, a key effort requires modernizing legacy applications or implementing solutions for applications and services that can’t natively support conditional access systems. This has the added benefit of eliminating the dependency on VPN and the corporate network. We’ve enabled auto-VPN for all users, which automatically routes users through the appropriate connection. Our goal is to eliminate the need for VPN and create a seamless experience for accessing corporate resources from the internet. With auto-VPN, the user’s system will transparently determine how to connect to resources, bypassing VPN for resources available directly from the internet or using VPN when connecting to a resource that is only available on the corporate network.

Amid the COVID-19 pandemic, a large percentage of our user population transitioned to work from home. This shift has provided increased use of remote network connectivity. In this environment, we’ve successfully identified and engaged application owners to initiate plans to make these applications or services accessible over the internet without VPN.

While we have taken the first steps toward modernizing legacy applications and services that still use VPN, we are in the process of establishing clear plans and timelines for enabling access from the internet. We also plan to invest in extending the portfolio of applications and services enforcing conditional access beyond Microsoft 365 and VPN.

Zero Trust architecture with Microsoft services

The graphic below provides a simplified reference architecture for our approach to implementing Zero Trust. The primary components of this process are Intune for device management and device security policy configuration, Microsoft Azure Active Directory (Azure AD) conditional access for device health validation, and Azure AD for user and device inventory.

The system works with Intune, by pushing device configuration requirements to the managed devices. The device then generates a statement of health, which is stored in Microsoft Azure AD. When the device user requests access to a resource, the device health state is verified as part of the authentication exchange with Azure AD.

Microsoft’s internal Zero Trust architecture.

A transition that’s paying off

Our transition to a Zero Trust model has made significant progress. Over the last several years, we’ve increased identity-authentication strength with expanded coverage of strong authentication and a transition to biometrics-based authentication by using Windows Hello for Business. We’ve deployed device management and device-health validation capabilities across all major platforms and will soon add Linux. We’ve also launched a Windows Virtual Desktop system that provides secure access to company resources from unmanaged devices.

As we continue our progress, we’re making ongoing investments in Zero Trust. We’re expanding health-validation capabilities across devices and applications, increasing the Virtual Desktop features to cover more use cases, and implementing better controls on our wired network. We’re also completing our IoT migrations and segmentation and modernizing or retiring legacy applications to enable us to deprecate VPN.

Each enterprise that adopts Zero Trust will need to determine what approach best suits their unique environment. This includes balancing risk profiles with access methods, defining the scope for the implementation of Zero Trust in their environments, and determining what specific verifications they want to require for users to gain access to their company resources. In all of this, encouraging the organization-wide embrace of Zero Trust is critical to success, no matter where you decide to begin your transition.

Collect telemetry and evaluate risks, and then set goals.
Get to modern identity and MFA—then onboard to AAD.
For conditional access enforcement, focus on top used applications to ensure maximum coverage.
Start with simple policies for device health enforcement such as device lock or password complexity.
Run pilots and ringed rollouts. Slow and steady wins the race.
Migrate your users to the Internet and monitor VPN traffic to understand internal dependencies.
Focus on user experience as it is critical to employee productivity and morale. Without adoption, your program will not be a success.
Communication is key—bring your employees on the journey with you!
Assign performance indicators and goals for all workstreams and elements, including employee sentiment.

Share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Implementing a Zero Trust security model at Microsoft appeared first on Inside Track Blog.

Transforming Microsoft’s enterprise network with next-generation connectivity

Inside Track staff — Mon, 25 Mar 2024 16:00:37 +0000

Next-generation connectivity is enabling us to transform our internal enterprise network here at Microsoft.

Deploying a more agile, secure, and effective network environment across Microsoft is empowering our employees to thrive in our new hybrid world. This article describes how we’re implementing this new network strategy, including goals, action areas, expected results, and a brief evaluation of anticipated future state and immediate next steps.

And this transformation is coming at a good time.

The need for digital transformation is evident across all industries, and our now largely mobile and remote workforce has created challenges and new requirements for our network environment.

Fortunately, employee productivity and satisfaction with connectivity remain high, despite these challenges as remote work is the new normal. During the pandemic, for example, we—our Microsoft Digital (MSD) team—tallied up to 190,000 unique remote connections daily.

Our next-generation connectivity strategy must account for both traditional in-building experience and hybrid experiences, and it must accommodate critical industry factors driving the usage of our network resources, including:

Advanced speed and scale of threats caused by cyberattacks.
The adoption of the cloud as our primary endpoint for business applications, security monitoring, and employee productivity—supplanting traditional on-premises infrastructure and network patterns.
The continuing impact of remote work and emerging hybrid scenarios.

Workplace modernization efforts are creating a surge of digital Internet of Things (IoT) devices and sensors on the network in Microsoft buildings and campuses.

With these factors in mind, we’re making changes to adapt to new traffic patterns, use cases, security requirements, and other demands of our network infrastructure. Legacy approaches to network operations won’t provide adequate service and security.

We’re embracing cloud WAN and edge security models associated with user identities, device states, and applications that aren’t directly dependent on the physical network infrastructure. We’re efficiently scaling network deployment and operations with investments in software-defined infrastructure, network automation, data-driven insights and AIOps.

Software-defined infrastructure has brought network and application security to the places where end users can consume them when needed, regardless of where they’re physically located. Automation and AIOps has started to eliminate some manual operations and eventually could encompass every part of the engineering and operational life cycle, including providing the capability to respond quickly to changing business needs and live site incidents.

[Discover the lessons we’ve learned in engineering Zero Trust networking. | Unpack our Zero Trust networking lessons for leaders. | Explore how we’re implementing a Zero Trust security model at Microsoft.]

Focus points for next-generation connectivity

Transforming our network to support next-generation connectivity requires us to reimagine many of our traditional connectivity models. We’re focusing on several different areas to help us reduce our legacy network dependencies and move to a more modern connectivity design:

Shift to the internet as the primary network transport. We’re using the internet as the primary network for most of our end-user and remote-office traffic. 90+ percent of corporate data infrastructure that the average knowledge worker at Microsoft uses resides in the cloud. Using the internet creates simpler, more fluid access to cloud-based infrastructure and services. Microsoft Entra Private Access solutions use the internet to bring all traffic into our software-defined cloud edge for all end-user connectivity, removing dependencies on legacy VPN solutions. Microsoft Azure Virtual Desktop, Windows 365 Cloud PC, and other virtualization solutions allow users to connect in a secure way to corporate resources without using VPN.
Reduce our corporate network footprint and dependencies. We’re transforming our enterprise perimeter from fixed resources and limited locations to a grouping of Zero Trust segments and services such as SASE and Microsoft Azure Firewall that provide dynamic edge capabilities widely available at all our cloud points of presence. Zero Trust moves beyond the notion of the physical network as a security perimeter and replaces it with identity and device-health evaluation in a set of services delivered from the cloud. Modern access controls will be able to apply policies independent of network location and connectivity method. Our users must be able to securely connect to corporate applications and services from any network, unless explicitly prohibited. Controls and monitoring must be oriented toward the resources that we want to protect, not only the infrastructure on which the resources reside.
Reduce MPLS WAN infrastructure and dependencies. Next-generation connectivity shifts traffic away from traditional Multiprotocol Label Switching Wide Area Network (MPLS WAN) aggregation hubs into a more distributed, internet-based transport model. Our firewall, network edge, and VPN aggregation points will no longer require MPLS backbone connectivity. The internet is the network of choice, and our users access their applications at the endpoint by using secure delivery and modern cloud edge logic. Moving our engineering build systems into the cloud will greatly reduce the need for large-capacity private backbone connectivity.
Align with emerging security standards. Increasing security threat profiles that attempt to access our critical systems and supply chain further reinforced our Zero Trust networking strategy. Introducing stronger macro and micro network segmentation that protects development and test pipelines will further advance the segmentation concepts introduced in Zero Trust.
Decrease wired user access infrastructure for end-user devices. Shifting end-user devices such as laptops from wired infrastructure to wireless helps simplify connectivity and eliminate technical debt and sustained funding. Wireless networking is the default connection method across our campuses for user-focused devices. In many cases, our wireless network supersedes our wired network infrastructure and makes it redundant. We’re investing in wireless technology for new deployments to reduce costs and foster an identity-driven and device-driven security approach across our network, while we repurpose our wired infrastructure for devices that require Power over Ethernet (PoE), such as building management systems, IoT devices, and digital signage.
Adopt modern network management practices. As our infrastructure continues to increase in complexity and size, reliance on traditional network management practices adds risk to service quality, resource efficiency, and our employee experience. We’re consolidating the scope of our management systems with an Azure-focused approach, using scalable cloud network manageability solutions where feasible. We’re also capitalizing on the vast amount of telemetry we collect to create self-healing networks that automatically and intelligently predict and address customer needs and potential platform failures using AIOps. As our network becomes increasingly software-defined, increased rigor around DevOps practices will improve the quality of check-ins, create safe deployments, and detect customer experience degradation.

Key design and architecture principles

We’re informing our strategy with several architecture and design principles for the network that will help us address our focus points while we develop the next-generation connectivity model. The key principles are outlined in the following sections.

Enforce, enhance, and expand Zero Trust

A Zero Trust architecture model reduces the risk of lateral movement in any network environment. Identities are validated and secured with multifactor authentication (MFA) everywhere. Using MFA eliminates password expirations and, eventually, passwords. The added use of biometrics ensures strong authentication for user-backed identities.

Enforceable device-health controls protect the network and require applications and services to observe and enforce our device-health policies. The health of all services and applications must be monitored to ensure proper operation and compliance and enable rapid response when those conditions aren’t met.

Least privilege access principles and network segmentation ensure that users and devices have access only to the resources required to perform their role. Both Microsoft Azure Entra ID authentication and conditional access will play a crucial role in establishing a Zero Trust model. The network will also supply mechanisms like wired port security and an 802.1x solution that allows users to register their devices for elevated access.

Adopt an internet-first, wireless-first approach for connectivity

The internet will be the default connectivity method for laptops, mobile devices, IoT applications, and most end-user processes. We’ll migrate remote offices to use the internet as their primary transport for end-user connectivity, replacing most of our MPLS hub and backbone services.

Some core workloads such as supply chain, high-risk engineering, and specific development workloads will still require private connectivity in the future. However, most users and applications will connect to the cloud via the internet. In these cases, the logical overlay can provide private connectivity over a physical underlay like the internet.

Employee workflows are moving toward flexible use of space both inside and outside the building, and that’s best facilitated through wireless connectivity.

Implement distributed, strong network services segmentation

As we migrate remote-office connectivity to internet-first, distributed services and strong network segmentation will be crucial. We’ll replace localized services such as DNS and firewall with distributed, cloud-based services, or deliver those services from a dedicated shared-services network segment.

The shared services segment will enable users to consume services by using firewall-protected data-plane access. This segment will only allow access to those ports and protocols needed for the end-devices to deliver their required functionality.

This segment will host typical network services like DNS and DHCP and management platforms but also security solutions such as identity platforms for authentication.

Optimize network connectivity for integrated voice/data services

There has been a recent acceleration in the digital transformation of our user experience and customer service interactions. Microsoft support services are accessible to customers through our Omni-Channel User Experience which includes not only channels such as voice, chat, email, social but also video interactions.

Real-time interactions will be served by peer-to-peer routing and the ability for our voice services to provide the most direct route possible. Network connectivity for voice services will, as available, include class of service (CoS), bandwidth allocation/management, route optimization and network security.

Internet first will be our preferred path for call routing with an exception for global locations where a private connection for voice is deemed necessary to guarantee quality of service (QoS) for our user experience.

Build software-defined, intelligent infrastructure

Software-defined infrastructure and network-as-code initiatives will create a more stable and agile network environment. We’re automating our network-provisioning processes to require minimal—and in some cases, zero-touch network provisioning. Network Intent will maintain network and device configuration after provisioning.

End users will be able to request their own preapproved network connectivity using just-in-time access with self-service capabilities through exposed APIs and user interfaces. We’re decoupling the network logical overlay from the physical underlay to allow users to connect to applications and allow devices to talk to each other on dedicated segments across our network while still adhering to security policies. Software-defined infrastructure will supply the flexibility needed to easily interchange connectivity modes and maintain the logical overlay without significant dependency on the network physical underlay.

Intelligent monitoring, diagnosis & self-healing capabilities

Data analytics applied to both real-time and historical telemetry help engineers to better understand the current state of our network infrastructure and anticipate future needs. Centralizing and analyzing the vast amount of telemetry we collect across the network helps us more efficiently detect, diagnose, and mitigate incidents to reduce their effects on customers.

Automated incident correlation will reduce pager-storms and incident noise for on-call engineers. Auto-correlation learns from historical incident co-occurrence patterns by using a combination of AI and machine learning techniques to correlate incoming alerts with active incidents.

Incident root cause analysis assists on-call engineers with diagnosing incidents in near real-time. Failure correlation helps engineering teams identify and fix service failures by correlating failure events with telemetry. In the future, we’ll move towards self-healing by using data and analytics to perform predictive problem management and proactive mitigation.

Secure the network infrastructure

The security controls ensuring the confidentiality, integrity, and availability of the underlying network infrastructure and the control and management stems for the network hardware will continue to evolve and protect from advanced persistent threats, vulnerabilities, and exploitation attempts. Zero-day vulnerabilities and supply-chain risks accelerate strategies to ensure that the foundation of network services remains resilient.

Administrator access to infrastructure and management tools requires the use of hardened workstations, certificate-only identities, and a multiple-approver access-request model to ensure that least privilege access is enforced. We’ll continue to segment network infrastructure and management tools to limit network attack surface.

Network infrastructure control design will continue to focus on creating admin resources that are unreachable and undiscoverable from the internet, other network infrastructure elements, or any other segment except for approved management tools.

Examining network future state

While next-generation connectivity involves our entire network infrastructure, our implementation will significantly affect several aspects of the network environment. We’ve established guidelines and anticipated future states for these areas to better define the ongoing implications for our network. These aren’t mandates, but rather suggestions for where we can focus in the future to optimize our investments while we keep our long-term goals in mind.

Backbone

The layout of our backbone will stay in parallel configuration with the Azure backbone as our services and applications increasingly migrate into Microsoft Azure. However, we expect our private backbone needs to decrease as we continue to adopt the Azure backbone and move our network edge to the cloud. Our end-user segment has limited impact on the required capacity for our backbone.

As we continue to isolate and migrate our engineering workloads into the Azure cloud following our zero-trust principles the required backbone capacity will be further reduced. In development-intensive regions this determines the vast majority of capacity.

MPLS WAN

Decoupling the underlay from overlay technologies for MPLS WAN will allow us to deploy more agile and dynamic physical connectivity. This will remove the requirement for manually built static point-to-point connections. We’ll use a software-defined overlay layer while permitting the flexibility of a high-bandwidth internet connection as the physical underlay.

Replacing existing MPLS connectivity isn’t the primary objective. We want to provide our users with the connectivity that they need in the most efficient way, with the best and most secure user experience. In situations where it makes sense or where connectivity is up for renewal, we’ll examine the optimal underlay, which might be an Azure based VWAN based connection, and no longer a point-to-point or MPLS connection.

We’ll continue to evaluate more agile connectivity methods for our offices that best serve our objective of a cloud-first, mobile-first model driven by a programmable, intelligent infrastructure.

End-user segments

As we migrate to the cloud, the typical traffic flow is from the end user to the public internet and Microsoft Azure. Using the internet for our end-user segments enables our users to access the Azure cloud using the shortest network path possible and use the capacity and global connectivity of Azure’s network to find the most optimal path to their destination or application. DIA connections or shared internet connections supply the optimal solution for many offices.

Developer segments

Developer and test environments require isolation from unrelated workflows to ensure security and high confidence in product integrity. Additionally, these environments have diverse ranges of client capabilities. Some are fully managed for ease of administration, while others run unmanaged devices to benefit performance testing and simplification of debugging products under test.

By segmenting these environments away from broader shared networks, development teams can build and test their respective products in isolated environments. The goal is to allow communication with the bare minimum number of endpoints to validate code end to end, have the flexibility to host various state clients within, and reduce exposures to and from other outside environments.

Private segments

We use several private segments that are physically isolated from each other but still require logical connectivity between these physical locations. Examples of these include:

Infrastructure Admin Network. Admin interfaces for our network devices that are hosted with very strict access regulations.
Facilities Modern. Modern IoT devices that use internet-only connectivity methods, such as Microsoft Azure IoT Hub.
Facilities Legacy. IoT devices that still require connectivity to the corporate network for proper operation, such as legacy phones, video-conferencing equipment, or printers.

To connect segments, we’ll need to rely on an overlay solution like Microsoft Azure Virtual WAN (VWAN). When we replace WAN links with internet-only underlay connectivity, the any-to-any connectivity model isn’t available via the underlay. If a device in office A needs to communicate with a device in office B hosted on the same segment, VWAN will provide the software-based overlay solution to make this possible.

Approaching the internet edge

Most of our network segments will use an internet edge. This shift alone won’t affect overall load on edge processing. However, we do expect a fundamental change in load from our traditional edges to a more distributed model. We’ll also have more security delivered programmatically from the cloud in tandem with our intelligent infrastructure initiative. We will reevaluate the need for large, centralized edge stamps. Edge use and configuration will be influenced by several factors, including:

End-user devices will use localized internet breakout.
More security controls are moving into the endpoint itself.
The design and rollout of a developer segment will have a major impact on our edge in certain locations, because it makes up the majority (90 percent) of the required capacity in certain locations.
Security handles will move into the cloud if cloud-based SASE solutions become the new norm for our company.

Transforming VPN

Our current VPN solution is a hardware-based investment. Although it has proven stability and delivered great value to the company during the COVID-19 pandemic period, classic on-or-off VPN behavior that provides access to all or none of the private applications doesn’t adequately support the Zero Trust model.

Replacing our current VPN environment isn’t a standalone goal. We need to address uncontrolled connectivity in an open corporate network environment that allows all devices to communicate with one another.

The future of our remote access is a blend of direct access to Microsoft Azure-based applications and limited scope access to legacy on-premises resources. Strong identity-based security and monitored traffic flow is a solution requirement. We’ll facilitate this access through multiple technology solutions instead of a single service.

SASE solutions will allow conditional access to applications via the internet and will be based on Microsoft Entra ID authentication. This allows for effective micro-segmentation in those situations where it’s necessary. In parallel, there will be use cases for Microsoft Azure Virtual Desktop and Windows 365 Cloud PC to provide selective access to applications, for example, as an alternative to an extranet connection. A SASE agent can also be installed on these devices to offer a cloud-based PC to vendors (or FTEs), providing virtual hardware in a software-defined environment.

Any reduction in VPN will be contingent on a massive effort to shift and lift hundreds of applications that are dependent on the corporate network. These applications need to become internet facing, whether through a proxy solution or through an intentional migration to the cloud. As these workloads shift, we’ll be able to reduce daily VPN usage and eventually eliminate it entirely for most of the infoworker persona.

In-building connectivity

With the continued shift to wireless, wired ethernet connectivity in buildings must diverge into two distinctly different use cases and fulfillment methods: concentrated wired centers for remaining high bandwidth workflows, and wired ethernet catering to IoT devices where power and data convergence needs are emerging with Power over Ethernet (PoE).

Pervasive wired Ethernet throughout our facilities has over the years created an unintended consequence—unique developer roles and workloads that have inhibited consolidation of labs, content hosts, and critical development components into centralized real estate spaces.

This creates a cycle of over-provisioning wired networking infrastructure in each new building constructed—wanting to have the capacity there if it’s ever needed— and has led to today’s current state of drastically underutilized switching infrastructure.

The average user-access switch utilization is currently between 10 percent and 20 percent in many regions, and closer to 30 percent or 40 percent in development-heavy regions like the Puget Sound and Asia.

Network infrastructure security

As devices and services have become more internet facing, gaps in capability or new threats have emerged that require additional security controls to augment the traditional controls that remain.

Cloud-based services often require application firewalls or private access controls to limit attack surfaces, enforce strong authentication, and detect inappropriate or malicious use. Devices often require solutions that are effective if they move between network medias (wired, wireless, cellular), to different network segments, or between corporate and external networks.

The threat landscape is constantly changing, and the way devices or services depend on the network has also become fundamentally different with cloud, mobility, and new types of networked devices. Because of this, the mechanisms to provide effective controls must be adapted and continue to evolve.

As our strategy unfolds, we’ll be making priority decisions on what area in our vision and strategy we’re ready to invest in over time. To that end, we’re currently addressing several questions that directly influence how we move forward with next-generation connectivity:

What is the preferred edge solution for our knowledge workers?
How do we design the developer and engineering segment?
What are the security requirements to protect the unmanaged devices that connect to our network every day?

We’re continuing to make changes to adapt to new traffic patterns, use cases, security requirements, and other use-driven demands of our network infrastructure. We’ll continue to transform our network, embracing new WAN and edge security models associated with user identities, devices, and applications. Software-defined infrastructure and processes will bring the network and application security to the places where end users can consume them when needed, regardless of where they’re physically located, thus providing a more agile, more secure, and more effective network environment throughout Microsoft.

The post Transforming Microsoft’s enterprise network with next-generation connectivity appeared first on Inside Track Blog.

Managing user identities and secure access at Microsoft

Inside Track staff — Mon, 30 Oct 2023 15:13:26 +0000

Managing identities and network access at Microsoft encompasses all the processes and tools used throughout the identity life cycle for employees, supplier staff, and partners. As a cloud-first company, our Microsoft Digital Employee Experience (MDEE) team uses features in the Microsoft Enterprise Mobility + Security suite, powered by Microsoft Azure, along with on-premises identity and access management solutions to enable our users to be securely productive, from anywhere.

We’re on a multi-year journey of transforming into a cloud-first, mobile-first enterprise. Though we operate in a hybrid cloud environment today, we’re moving on-premises identity technologies to the cloud, giving our employees flexibility they need. Plus, application owners can use the power of Microsoft Graph to effectively manage access to applications and resources.

[See how we’re implementing strong user authentication with Windows Hello for Business. Learn more about verifying identity in a Zero Trust model internally at Microsoft. Unpack implementing a Zero Trust security model at Microsoft.]

Unifying the environment

To enable a single user identity for authentication and offer a unified experience, we integrated on-premises Windows Server Active Directory forests with Microsoft Azure Active Directory (Azure AD). Our geographically distributed Active Directory environment uses Windows Server 2016. We use Azure AD Connect and Active Directory Federation Services (AD FS) when an Azure-based application needs user attributes—for example, their location, organization, or job title. User information is available if the service has the right permissions to query for those attributes.

As shown in the image below, our identity and access environment is hybrid, federated, and cloud-synced.

A high-level overview of the Microsoft identity and access environment.

The Microsoft identity environment includes:

131,300 employees
303,000 global identities
488,000 partner identities
10,400 privileged identities that have some level of elevated access
15 million authentication requests per month
1.6 million cloud applications—99 percent of our apps—using Azure Active Directory (Azure AD)
3,000 applications using Active Directory Federation Services (AD FS)

Microsoft Azure Active Directory Connect

Microsoft Azure Active Directory Connect integrates on-premises directories with Microsoft Azure Active Directory. It gives users a single identity in Office 365, Azure, and software as a service (SaaS) applications that are integrated with Azure AD. Azure AD Connect consists of three main components:

Synchronization services. Microsoft Azure AD Connect sync services creates users, groups, and other objects. It makes sure that the on-premises identity for users and groups matches the cloud identity.
AD Federation Services. Federation is an optional part of Microsoft Azure AD Connect that’s used to configure hybrid environments using an on-premises AD FS infrastructure. It supports single sign-on and enforces Active Directory sign-on policy with smart card or third-party, multifactor authentication.
Health Monitoring. Microsoft Azure AD Connect Health offers a central location in the Microsoft Azure portal to monitor system health.

Enabling identity models in Microsoft 365

Microsoft 365 supports three identity models that support a variety of identity scenarios. Depending on how you manage identities, you can use a cloud identity model, federated identity model, or the synchronized identity model.

We use the federated model where we synchronize on-premises directory objects with Microsoft 365 and manage our users on-premises. The users have the same password on-premises and in the cloud, and they do not have to sign in again to use Microsoft 365. The user password is verified by AD FS—the password hash doesn’t need to be synchronized to Microsoft Azure AD, and the user doesn’t have to sign in again to use Microsoft 365.

Enabling users

Every employee, supplier staff, or partner that needs access to the corporate network receives an email address to sign in to their primary account. That primary account is synced to Microsoft Azure AD and gives the user access to corporate resources, Microsoft 365, Microsoft SaaS, and corporate business unit and third-party SaaS and platform as a service (PaaS) applications (such as apps for expenses or travel).

Strong authentication

We require multifactor authentication to verify a user’s identity before giving them access to corporate resources when they’re not connected to the corporate network. People use multifactor authentication in a few ways, including certificate-backed virtual and physical smart cards, Windows Hello for Business with PIN or biometric sign in, and Microsoft Azure Multi-Factor Authentication (MFA) that uses a phone or the Microsoft Authenticator app. On domain-joined devices that we manage, multifactor authentication has become almost transparent to users.

Currently, the use rate for each authentication method is approximately:

Certificate-based using virtual or physical smart cards—21 percent.
Windows Hello for Business—25 percent.
Microsoft Azure MFA using phone authentication or an authenticator app—54 percent.

Certificate-based

For many years, certificated-based physical and virtual smart cards were the main method of multifactor authentication. As the other options have been enabled, smart card use has been declining.

Windows Hello for Business

With the deployment of Windows 10 and Windows 11, we enabled Windows Hello for Business, which can replace passwords with strong two-factor authentication that combines an enrolled device with a PIN or biometric (fingerprint or facial recognition) to sign in. Windows Hello was easily implemented within our existing identity infrastructure, by extending certificates to include the use of a PIN or biometrics as an enterprise credential; plus, it allows remote access. Users can sign in to their Microsoft account, an Active Directory account, or an Azure AD Premium account.

Microsoft Azure MFA

Although Windows Hello has become the preferred method for our Windows 10 and Windows 11 domain-joined devices, we support access using mobile platforms such as iOS and Android. Microsoft Azure MFA is the best solution for securing our users and data on these platforms, and it integrates seamlessly with our existing AD FS infrastructure.

Enabling partner access

For Microsoft partners, we’ve started using Microsoft Azure Active Directory business-to-business (B2B) collaboration. Microsoft Azure AD B2B collaboration make it easier to enable single sign-on access to Microsoft extranet applications and collaboration spaces.

Invitation lists are created using manually exported lists in the form of comma-separated value (.csv) files. We’re working on automating the process to reduce the potential for error and increase our speed.

Enabling mobile access

As a cloud-first company, many of our corporate resources are in the cloud. People can use multifactor authentication to securely access their work from anywhere. To enable mobile access to on-premises resources, we use a couple of remote access solutions:

We help ensure the security of cloud resources and remote access at Microsoft by validating who the user is through multifactor authentication.
We check system health to ensure that the user accesses corporate resources on a device that’s managed through System Center Configuration Manager or Microsoft Intune, and that the device has all the latest updates installed.

We introduced Conditional Access in a Windows 10 update and are continuing its use with Windows 11. To help ensure that users sign in from a healthy device with strong authentication, we also configure device management policies for remote access.

Role and user attribute access roles

We’ve started to implement role-based and user attribute–based access. We created a couple of dynamic groups that set up parameters for variable access to resources based on device, location, and type of user. We’ve focused on large user categories—such as employees and supplier staff—but we’re working on adding more specific roles.

Self-service

Cloud services gave us the ability to introduce more self-service capabilities for identity and access management. These services have helped reduce manual administrative tasks and Helpdesk support calls for help with password and identity management changes.

Password management. Microsoft employees can change their passwords using an internal, cloud-based, self-service password management solution. We integrated Azure MFA, including verification with a phone call or mobile app, as part of the process. Users are prompted to answer verification questions when they change a password. When users need to change their password, they do so without calling Helpdesk.
Security and distribution group management. Tools like Microsoft 365 Teams help users manage their teams without going through an administrator or Helpdesk to create and manage their groups. Group owners can set up access policies to user groups rather than individual.
Microsoft Azure AD Join. At Microsoft, we support bring your own device (BYOB) scenarios; many employees do part of their work on their personal device. In Windows 10 and now in Windows 11, our users can add an Azure AD account, and their device will be enrolled in mobile device management through Microsoft Intune.

Managing the service

To manage on-premises and cloud identity services and enable day-one productivity for new users, we use established processes for provisioning and deprovisioning user identities, and to monitor and audit account usage.

User account life cycle management

User account life cycle management includes all processes for provisioning and deprovisioning user identities for both full‑time employees (FTEs) and supplier staff. New identities are created in our accounts provisioning system and in Active Directory. After an account is created, user sync enables it in Microsoft Azure AD. Account provisioning is the same for both FTE and supplier staff identities, but they are granted different levels of access. For example, by default, FTEs are granted remote access, but it’s granted for supplier staff only upon manager approval.

Provisioning

To provision an active user account:

A business manager or Human Resources employee submits a provisioning request, which includes information such as the legal name of the user, the domain, their physical location, and if they have an office number or are a mobile worker.
After the request is submitted, the data is consumed by SAP, which generates a unique employee ID number for each person. New employee data automatically goes through the SAP web service to the accounts provisioning service.
The accounts service receives the employee ID from the SAP web service, with an action to provision the user account. We create the user identity, alias, and a temporary password. We also create the mailbox where the mailbox object is stamped in Active Directory.
The user employee ID number and provisioned alias are provided as a key pair back to the SAP web service.
- SAP publishes the data through an HC01 feed, which gives the user identity access to Human Resources productivity sites, including finance and benefits.
- In parallel, the accounts service sends account details (alias, temporary password, and mailbox) to the manager, sponsors, and other designated recipients identified in the original provisioning request.

Deprovisioning

When we receive a deprovisioning request from a business manager or Human Resources, we terminate and disable the user account from corporate access, including alternate and elevated access credentials. We reset the user’s network password and disable remote access, Office Web Access, and Skype for Business.

Automating our processes

For the past several years, we’ve automated our core service scenarios to help ensure an active user account is provisioned before an employee’s start date, so employees will be productive on their first day. This includes provisioning scenarios for new hires, rehires, new supplier staff, and supplier staff to FTE conversions. We plan to automate additional post-provisioning activities, such as renaming aliases, domain moves, and remote access requests. We also plan to migrate the service to a third-party platform that will provide increased efficiency, integration, and provide more identity-related self-service capabilities.

Auditing and monitoring

Auditing identities is like other services—we collect event types in our central log management system. For monitoring identity events, we coordinate with the engineering team to define use cases and the monitoring conditions that would alert us. Those correlated conditions could come from any of the different monitoring pipelines, including Microsoft Cloud App Security.

Network monitors and endpoint monitoring, like Windows Defender Advanced Threat Protection (ATP), feed in to our security and information event management (SIEM) solution. When we get an alert, the service operations team determines whether the alert is valid and if it needs to be opened for investigation. If we determine that it’s a real alert, we use the information in the SIEM and in the Defender ATP console to begin investigating. Depending on the severity of the alert, it could be escalated to the incident response team, the legal department, or even law enforcement.

To help us deal with the number of alerts, we use a third-party behavioral analytics tool that helps reduce event noise and helps us better identify real events.

Cloud-enabled protection

Our traditional approach was to protect all the assets on the corporate network. Because we’ve moved most of our assets to the cloud, we now focus on security at the user level. We do that by introducing better user and admin accountability with security and governance, including:

Controlling access to resources.
Requiring strong user authentication.
Responding to advanced threats by monitoring for specific risk-based scenarios.
Mitigating administrative risks.
Governance of on-premises and cloud identities.

Using privileged identities to manage elevated access

Privileged identity management is an important part of a larger effort within Microsoft Digital to secure high-value corporate assets. Of the roughly 303,000 identities that we manage, approximately 10,000 on‑premises and 400 Microsoft Azure AD users need elevated access to data and services. We have other tools and a process for the subset of users that have administrative—or elevated—access to data and services at Microsoft.

One important way we protect elevated access accounts is through just in time (JIT) access. Rather than have separate credentials or a persistent admin session, JIT elevates access for a specific duration. Access expires at the end of that time.

We also protect elevated accounts by requiring on-premises admins to sign in on secure access workstations, and we plan to expand that requirement to cloud admins.

As our identity and access management solution has evolved, we’ve discovered a few best practices, including:

If your Human Resources systems are primarily on-premises, you can use a hybrid approach to create user identities in Active Directory and sync them to the cloud.
Assess your environment honestly to determine how to best use the cloud to reduce complexity for on-premises process points. Migrating to the cloud offers an opportunity to reassess existing processes and identify ways that the cloud can make your identity infrastructure more efficient. The scalability of Azure solutions offer advantages that can help modernize processes and technology dependencies.
Identity is the new security perimeter. Azure provide the ability through it’s security products to help protect those identities much more effectively than traditional network/datacenter monitoring.

As for our future, we’re exploring other changes to identity management, including eliminating time-bound password expiration by moving to a system where passwords stop working only when triggered by specific risks or user behaviors.

Beyond that, we look forward to a future where our identities are in the cloud and there are no passwords. Within the next few years, we hope to move to a purely cloud-based service model, rather than our current, cloud-enabled state. Windows Hello for Business was our first step toward a future where biometrics replace passwords.

Also, we’re deploying Azure Information Protection at Microsoft. Azure Information Protection integrates with cloud identities to help protect corporate data. We continually look for ways that we can be more efficient as we move to cloud-only solutions for identity and access management.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Managing user identities and secure access at Microsoft appeared first on Inside Track Blog.

Sharing how Microsoft now secures its network with a Zero Trust model

Jason Kellington — Thu, 27 Jul 2023 15:00:26 +0000

Editor’s note: We’ve republished this blog with a new companion video.

Safeguarding corporate resources is a high priority for any business, but how does Microsoft protect a network perimeter that extends to thousands of global endpoints accessing corporate data and services 24 hours a day, seven days a week?

It’s all about communication, collaboration, and expert knowledge.

Phil Suver, senior director of networking at Microsoft, and his team help champion Zero Trust networking, an important part of Microsoft’s broader Zero Trust initiative. Driven by Microsoft’s security organization, the Zero Trust model centers on strong user identity, device-health verification, application-health validation, and secure, least-privilege access to corporate resources and services.

“Zero Trust networking is ultimately about removing inherent trust from the network, from design to end use,” Suver says. “The network components are foundational to the framework for our Zero Trust model. It’s about revising our security approach to safeguard people, devices, apps, and data, wherever they’re located. The network is one piece. Identity and device health are additional pieces. Conditional access and permission are also required. That involves the entire organization.”

Indeed, this extensive initiative affects all of Microsoft and every employee. To support the Zero Trust initiative, Microsoft’s network engineering team is partnering with the security and end-user experience teams to implement security policy and identity while ensuring productivity for employees and partners. Suver says that his team always aims to minimize those impacts and communicate how they ultimately result in benefits.

“We’re fundamentally changing the way our network infrastructure has worked for more than two decades,” Suver says. “We’re moving away from internal private networks as the primary destination and toward the internet and cloud services as the new default. The security outcomes are the priority, but we have to balance that against business needs and productivity as well so that connectivity is transparent.”

That means being sensitive to how implementing Zero Trust networking affects users on a granular level, to ensure that employees don’t experience work stoppages or interruptions.

“Some of our efforts aren’t very disruptive, as they’re simply accelerating in a direction we were already heading,” Suver says. “Shifting to internet-first and wireless-first network design, and enabling remote work are examples of that. Others are indeed disruptive, so we work closely with the affected groups to help them understand the impact.”

Suver notes that understanding and communication are critical to avoiding disruption.

“Microsoft is an established enterprise, with some software and systems that have been in place for decades,” he says. “To run our business effectively, we must be able to accommodate processes and technology that might not be immediately ready to transition to a Zero Trust architecture.”

We’re able to be a little more opportunistic and aggressive with our in-building connectivity experiences while our user base is working remotely. This has allowed us to roll out configurations and learn things with a much smaller user population on-campus.

– Phil Suver, senior director of networking

Suver stresses the importance of working closely with affected employees. “We need to partner closely with our engineering teams to understand their connectivity requirements and build solutions around those for the short term and then broaden our scope for the longer term.”

With more employees working from home than ever due to COVID-19, many deployments in Microsoft buildings have been implemented relatively quickly and efficiently because of the decreased on-campus presence. Engineering teams can perform rollouts, including deploying new network segments, creating new wireless connections, and deploying network security policy with much less disruption than if buildings were fully occupied.

“We’re able to be a little more opportunistic and aggressive with our in-building connectivity experiences while our user base is working remotely,” Suver says. “This has allowed us to roll out configurations and learn things with a much smaller user population on-campus.”

[Check out these lessons learned and best practices from the Microsoft engineers who implemented Zero Trust networking. Find out what Microsoft’s leaders learned when they deployed Zero Trust networking internally at the company. Read Brian Fielder’s story on how Microsoft helps employees work securely from home using a Zero Trust strategy.]

Managing Zero Trust networking across the enterprise

Mildred Jammer, Zero Trust network principal program manager at Microsoft, acknowledges that the inherent complexity of Microsoft’s operations—there are more than 1 million devices on Microsoft’s network, which supports more than 200,000 employees and partners, which requires a highly strategic planning approach to transition to a Zero Trust environment.

“It’s a huge scope, and we have so many different environments to consider. Unsurprisingly, planning is a top priority for our teams,” says Jammer, whose work centers around ensuring that people and functional groups across Microsoft unite to ensure that Zero Trust Networking initiatives receive the priority that they deserve.

Zero Trust networking goals include reducing risk to Microsoft by requiring devices to authenticate to achieve network access providing a network infrastructure that supports device isolation and segmentation. A third key goal is to devise a system for enhancing response actions if devices are determined to be vulnerable or compromised.

“Zero Trust networking extends beyond the scope of Microsoft networking teams,” Jammer says.

Jammer says that many business groups at Microsoft might not understand what Zero Trust networking is, or they might not consider it as important as other initiatives they’re supporting.

“Zero Trust networking is a huge priority for our teams, but our business groups have their own priorities that don’t account for Zero Trust,” Jammer says. “Neither priority is optional, and they may conflict. We must manage that.”

She says communication being upfront with requirements, and collaborating willingly across Microsoft to ensure everyone’s needs are met.

Jammer says that distilling high-level goals into smaller, more achievable objectives helps employees and partners understand the practicalities of Zero Trust networking so that her teams can establish realistic expectations. “For example, we worked with the security team to break down risk mitigation into specific risks and the outcomes,” she says. “We developed solutions to deliver the outcomes and grouped them when there were commonalities. If business priorities challenged our outcomes, we could break down those groupings, as necessary.”

Jammer cites the deployment of Zero Trust networking as an example, noting that her team initially planned to deploy globally across all wired and wireless networks.

“We planned for a full deployment, but soon learned how disruptive that would be to our developers and infrastructure,” she says. “So, we broke it into chunks, we implemented changes to wireless networks with internet-first posture, and then came back to address our wired networks. To minimize impact and identify best practices, we used flighting deployments with a ring-based approach, starting with a smaller, well-understood population that closely represented our larger target population. As we gained more experience and confidence, we expanded the deployment to reach a larger population.”

Jammer notes that using targeted, achievable goals not only help get work done but also help identify when larger goals might be challenging to accomplish.

“Breaking down large goals into an agile-friendly process was also crucial to demonstrate areas that simply weren’t achievable near term,” Jammer says. “It’s more concrete and actionable to tell someone that we can’t refactor a specific app to be internet-facing than it is to say that we can’t eliminate our corporate intranet infrastructure.”

Making Zero Trust networking a reality

For David Lef, Zero Trust principal IT enterprise architect at Microsoft, implementing Zero Trust networking in a live networking environment carries a significant challenge.

“Reducing risk is a big focus in Zero Trust, but we need to do so with as minimal impact to user experience and productivity as possible,” Lef says. “Our users and employees need this network to perform their job functions. There is a reality that some things have to continue to work in their current state.”

Lef cites a few examples, including printers that didn’t support internet connectivity, IoT devices that required manual configuration, and simple devices that didn’t support Dynamic Host Configuration Protocol (DHCP). “We isolate those on the network and potentially come back to them later while we address projects that are ready to adapt to Zero Trust.”

Lef’s team actively works to establish network access, implement policies, segment networks, and onboard Microsoft business groups, regions, and teams to the Zero Trust networking model. While Zero Trust networking is critical to enabling a Zero Trust model, enterprise-wide collaboration and adoption are equally vital.

“We put a lot of effort into observing activity and talking with our local IT representatives about the details and challenges of each phase of our implementation,” Lef says. “We created our deployment plans so that employees and partners could naturally adopt the new network designs and usage patterns without significant effort on their part.”

For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=OCsTRnAb-pg, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Lef and Suver discuss how Microsoft helps its employees stay productive while working remotely.

Lef notes that his team recommends best practices to partners and suppliers to help build Zero Trust-friendly products and solutions.

“Making legacy technology conform to Zero Trust is difficult. We want to adopt solutions built for Zero Trust networking across our entire enterprise as much as possible. Identities, devices, apps, data, infrastructure—they all contribute to the model, along with networking,” Lef says. “Across the organization, all of these need to be in place for a properly functioning Zero Trust model.”

Thinking about the broader picture

Soumya Subramanian, partner general manager of enterprise infrastructure services at Microsoft, recognized a need to bring multiple workstreams together to accommodate the size and scope of deploying Zero Trust networking.

As organizations consider the scope of what they want to achieve with Zero Trust, they should remember to think about other network modernization initiatives and be intentional in either combining them under the broader program or allowing them to operate independently.

– Soumya Subramanian, partner general manager of Enterprise Infrastructure Services

Soumya Subramanian is a partner general manager of Enterprise Infrastructure Services at Microsoft. (Photo submitted by Soumya Subramanian)

“We already had a workstream in flight to move remaining applications from the corporate network to the cloud” Subramanian says. “We also needed to accelerate our long-term plans for remote connectivity due to the pandemic, which allowed us to reevaluate remote access technologies under the context of Zero Trust. For instance, as you move high-volume applications off the corporate network and onto the cloud, you reduce VPN volumes and usage. You need to consider alternate remote connectivity solutions like Secure Access Service Edge (SASE), virtual desktops, and application proxy services in your Zero Trust networking scope, not just the in-building user experience.”

Subramanian notes that these efforts depend on network automation and data-collection workstreams that many organizations could use to accelerate Zero Trust deployment.

“We started to tie these efforts together so that the network designs and policies we created for Zero Trust could be managed through automation at scale. As a result, we’re more data driven with clear objectives and key results that connect these dependent workstreams.”

“As organizations consider the scope of what they want to achieve with Zero Trust, they should remember to think about other network modernization initiatives and be intentional in either combining them under the broader program or allowing them to operate independently,” Subramanian says.

The post Sharing how Microsoft now secures its network with a Zero Trust model appeared first on Inside Track Blog.

Microsoft’s digital security team answers your Top 10 questions on Zero Trust

Mark Skorupa — Tue, 18 Jul 2023 19:31:58 +0000

Our internal digital security team at Microsoft spends a fair amount of time talking to enterprise customers who face similar challenges when it comes to managing and securing a globally complex enterprise using a Zero Trust security model. While every organization is unique, and Zero Trust isn’t a “one size fits all” approach, nearly every CIO, CTO, or CISO that we talk to is curious to learn more about our best practices.

We thought it would be useful to share our answers to the Top 10 Zero Trust questions from customers across the globe.

It’s surprising to us how many companies haven’t embraced multifactor authentication. It’s the first step we took on our Zero Trust journey.

– Mark Skorupa, principal program manager

If you had to pick, what are your top three Zero Trust best practices?

Microsoft’s approach to Zero Trust means we don’t assume any identity or device on our corporate network is secure, we continually verify it.

With that as context, our top three practices revolve around the following:

Identities are secure using multifactor authentication (MFA): It’s surprising to us how many companies haven’t embraced multifactor authentication. It’s the first step we took on our Zero Trust journey. Regardless of what solution you decide to implement, adding a second identity check into the process makes it significantly more difficult for bad actors to leverage a compromised identity over just passwords alone.
Device(s) are healthy: It’s been crucial that Microsoft can provide employees secure and productive ways to work no matter what device they’re using or where they’re working, especially during remote or hybrid work. However, any devices that access corporate resources must be managed by Microsoft and they must be healthy, meaning, they are running the latest software updates and antivirus software.
Telemetry is pervasive: The health of all services and applications must be monitored to ensure proper operation and compliance and enable rapid response when those conditions are not met. Before granting access to corporate resources, identities and devices are continually verified to be secure and compliant. We monitor telemetry looking for signals to identify anomalous patterns. We use telemetry to measure risk reduction and understand the user experience.

For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=TOrbiC8DGPE, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

At Ignite 2020, experts on Microsoft’s digital security team share their lessons learned from implementing a Zero Trust security model at the company.

Does Microsoft require Microsoft Intune enrollment on all personal devices? Can employees use their personal laptops or devices to access corporate resources?For employees who want access to Microsoft corporate resources from a personal device, we require that devices be enrolled in Microsoft Intune. If they don’t want to enroll their personal device, that’s perfectly fine. They can access corporate resources through the following alternative options:

Windows Virtual Desktop allows employees and contingent staff to use a virtual remote desktop to access corporate resources like Microsoft SharePoint or Microsoft Teams from any device.
Employees can use Outlook on the web to access their Microsoft Outlook email account from the internet.

How does Microsoft onboard its Internet of Things (IoT) devices under the Zero Trust approach?

IoT is a challenge both for customers and for us.

Internally, Microsoft is working to automate how we secure IoT devices using Zero Trust. In June, the company announced the acquisition of CyberX, which will complement existing Microsoft Azure IoT security capabilities.

We segment our network and isolate IoT devices based on categories, including high-risk devices (such as printers); legacy devices (like digital coffee machines) that may lack the security controls required; and modern devices (such as smart personal assistant devices like an Amazon Echo) with security controls that meet our standards.

How is Microsoft moving away from VPN?

We’ve made good progress in moving away from VPN by migrating legacy, on-premises applications to cloud-based applications. That said, we still have more work to do before we can eliminate VPN for most employees. With the growing need to support remote work, we moved quickly to redesign Microsoft’s VPN infrastructure by adopting a split-tunneled configuration where traffic is directly routed to the applications available in the cloud and through VPN for any legacy applications. The more legacy applications we make available directly from the internet, the less we need VPN.

How do you manage potential data loss?

Everyone at Microsoft is responsible for protecting data, and we have specific scenarios that call for additional security when accessing sensitive data. For example, when an employee needs to make changes to customer-facing production systems like firewalls, they use privileged access workstations, a dedicated operating system for sensitive tasks.

Our employees also use features in Microsoft Information Protection, like the sensitivity button in Microsoft 365 applications to tag and classify documents. Depending on the classification level—even if a document moves out of our environment—it can only be opened by someone that was originally provided access.

How can Zero Trust be used to isolate devices on the network to further reduce an attack surface?

The origins of Zero Trust were focused on micro-segmentation of the network. While Microsoft’s focus extends beyond the physical network and controlling assets regardless of connectivity or location, there is still a strong need for implementing network segmentation within your physical network.

We currently have segmented our network into the configuration shown in the following diagram, and we’re evaluating future segments as the need arises. For more details on our Zero Trust strategy around networking, check out Microsoft’s approach to Zero Trust Networking and supporting Azure technologies.

Network segmentation is used to isolate certain devices, data, or services from other resources that have direct access.

How do you apply Zero Trust to a workstation where the user is a local admin on the device?

For us, it doesn’t matter what the device or workstation is, or the type of account used—any device that is looking for access to corporate resources needs to be enrolled and managed by Microsoft Intune, our device management service. That said, our long-term vision is to build an environment where standard user accounts have the permission levels to be just as productive as local admin accounts.

How important is it to have Microsoft Azure AD (AAD), even if we have Active Directory (AD) on-premises, for Zero Trust to work in the cloud? Can on-premises Active Directory alone work to implement Zero Trust if we install Microsoft Monitoring Agent (MMA) to it?

Because Microsoft has shifted most of our security infrastructure to the Microsoft Azure cloud, using Microsoft Azure AD Conditional Access is a necessity for us. It helps automate the process and determine which identities and devices are healthy and secure, which then enforces the health of those devices.

Using MMA would get you to some level of parity, but you wouldn’t be able to automate device enforcement. Our recommendation is to create an AAD instance as a replica of your on-premises AD. This allows you to continue using your on-premises AD as the master but still leverage AAD to implement some of the advanced Zero Trust protections.

How do you deal with Zero Trust for guest access scenarios?

When allowing guests to connect to resources or view documents, we use a least-privileged access model. Documents tagged as public are readily accessible, but items tagged as confidential or higher require the user to authenticate and receive a token to open the documents.

We also tag resources like Microsoft SharePoint or Microsoft Teams locations that block guest access capabilities. Regarding network access, we provide a guest wireless service set identifier (SSID) for the guest to connect to which are isolated with internet only access. Finally, all guest accounts are required to meet our MFA requirements prior to granting access.

We hope this guidance is helpful to you no matter what stage of the Zero Trust journey you’re on. As we look to 2021, the key lesson is to have empathy. Understanding where an employee is coming from and being transparent with them about why a policy is shifting or how it may impact them is critical.

– Mark Skorupa, principal program manager

What’s your Zero Trust priority for 2021?

We’re modernizing legacy and on-premises apps to be available directly from the internet. Making these available, even apps with legacy authentication requirements, allows our device management service to apply conditional access, which enforces verification of identities and ensures devices are healthy.

We hope this guidance is helpful to you no matter what stage of the Zero Trust journey you’re on. As we look to the rest of 2021, our team continues to come back to is the importance of empathy. Understanding where an employee is coming from and being transparent with them about why a policy is shifting or how it may impact them is critical.

Microsoft wasn’t born in the cloud either, so many of the digital security shifts we’re making by taking a Zero Trust approach aren’t familiar to our employees or can be met with hesitancy. We take ringed approaches to everything we roll out, which enables us to pilot, test, and iterate on our solutions based on feedback.

Leading with empathy keeps us focused on making sure employees are productive and efficient, and that they can be stewards of security here at Microsoft and with our customers.

Read this article about how Microsoft is adopting a Zero Trust security model to secure corporate and customer data.

The post Microsoft’s digital security team answers your Top 10 questions on Zero Trust appeared first on Inside Track Blog.

Providing modern data transfer and storage service at Microsoft with Microsoft Azure

Inside Track staff — Thu, 13 Jul 2023 14:54:07 +0000

Companies all over the world have launched their cloud adoption journey. While some are just starting, others are further along the path and are now researching the best options for moving their largest, most complex workflows to the cloud. It can take time for companies to address legacy tools and systems that have on-premises infrastructure dependencies.

Our Microsoft Digital Employee Experience (MDEE) team has been running our company as mostly cloud-only since 2018, and continues to design cloud-only solutions to help fulfill our Internet First and Microsoft Zero Trust goals.

In MDEE, we designed a Modern Data Transfer Service (MDTS), an enterprise-scale solution that allows the transfer of large files to and from partners outside the firewall and removes the need for an extranet.

MDTS makes cloud adoption easier for teams inside Microsoft and encourages the use of Microsoft Azure for all of their data transfer and storage scenarios. As a result, engineering teams can focus on building software and shipping products instead of dealing with the management overhead of Microsoft Azure subscriptions and becoming subject matter experts on infrastructure.

[Unpack simplifying Microsoft’s royalty ecosystem with connected data service. | Check out how Microsoft employees are leveraging the cloud for file storage with OneDrive Folder Backup. | Read more on simplifying compliance evidence management with Microsoft Azure confidential ledger.]

Leveraging our knowledge and experience

As part of Microsoft’s cloud adoption journey, we have been continuously looking for opportunities to help other organizations move data and remaining legacy workflows to the cloud. With more than 220,000 employees and over 150 partners that data is shared with, not every team had a clear path for converting their transfer and storage patterns into successful cloud scenarios.

We have a high level of Microsoft Azure service knowledge and expertise when it comes to storage and data transfer. We also have a long history with legacy on-premises storage designs and hybrid third-party cloud designs. Over the past decade, we engineered several data transfer and storage services to facilitate the needs of Microsoft engineering teams. Those services traditionally leveraged either on-premises designs or hybrid designs with some cloud storage. In 2019, we began to seriously look at replacing our hybrid model, which included a mix of on-premises resources, third party software, and Microsoft Azure services, with one modern service that would completely satisfy our customer scenarios using only Azure—thanks to new capabilities in Azure making it possible and it being the right time.

MDTS uses out of the box Microsoft Azure storage configurations and capabilities to help us address legacy on-premises storage patterns and support Microsoft core commitments to fully adopt Azure in a way that satisfies security requirements. Managed by a dedicated team of service engineers, program managers, and software developers, MDTS offers performance, security, and is available to any engineering team at Microsoft that needs to move their data storage and transfer to the cloud.

Designing a Modern Data Transfer and Storage Service

The design goal for MDTS was to create a single storage service offering entirely in Microsoft Azure, that would be flexible enough to meet the needs of most engineering teams at Microsoft. The service needed to be sustainable as a long-term solution, continue to support ongoing Internet First and Zero Trust Network security designs, and have the capability to adapt to evolving technology and security requirements.

Identifying use cases

First, we needed to identify the top use cases we wanted to solve and evaluate which combination of Microsoft Azure services would help us meet our requirements. The primary use cases we identified for our design included:

Sharing and/or distribution of complex payloads: We not only had to provide storage for corporate sharing needs, but also share those same materials externally. The variety of file sizes and different payload characteristics can be challenging because they don’t always fit a standard profile for files (e.g., Office docs, etc.).
Cloud storage adoption (shifting from on-premises to cloud): We wanted to ensure that engineering teams across Microsoft that needed a path to the cloud would have a roadmap. This need could arise because of expiring on-premises infrastructure, corporate direction, or other modernization initiatives like ours.
Consolidation of multiple storage solutions into one service, to reduce security risks and administrative overhead: Having to place data and content in multiple storage datastores for the purposes of specific sharing or performance needs is both cumbersome and can introduce additional risk. Because there wasn’t yet a single service that could meet all their sharing needs and performance requirements, employees and teams at Microsoft were using a variety of locations and services to store and share data.

Security, performance, and user experience design requirements

After identifying the use cases for MDTS, we focused on our primary design requirements. They fell into three high-level categories: security, performance, and user experience.

Security

The data transfer and storage design needed to follow our Internet First and Zero Trust network design principles. Accomplishing parity with Zero Trust meant leveraging best practices for encryption, standard ports, and authentication. At Microsoft, we already have standard design patterns that define how these pieces should be delivered.

Encryption: Data is encrypted both in transit and at rest.
Authentication: Microsoft Azure Active Directory supports both corporate synced domain accounts, external business-to-business accounts, as well as corporate and external security groups. Leveraging Azure Active Directory allows teams to remove dependencies on corporate domain controllers for authentication.
Authorization: Microsoft Azure Data Lake Gen2 storage provides fine grained access to containers and subfolders. This is possible because of many new capabilities, most notably the support for OAuth, hierarchical name space, and POSIX permissions. These capabilities are necessities of a Zero Trust network security design.
No non-standard ports: Opening non-standard ports can present a security risk. Using only HTTPS and TCP 443 as the mechanisms for transport and communication prevents opening non-standard ports. This includes having software capable of transport that maximizes the ingress/egress capabilities of the storage platform. Microsoft Azure Storage Explorer, AzCopy, and Microsoft Azure Data Factory meet the no non-standard ports requirement.

Performance

Payloads can range from being comprised of one very large file, millions of small files, and every combination in between. Scenarios across the payload spectrum have their own computing and storage performance considerations and challenges. Microsoft Azure has optimized software solutions for achieving the best possible storage ingress and egress. MDTS helps ensure that customers know what optimized solutions are available to them, provides configuration best practices, and shares the learnings with Azure Engineering to enable robust enterprise scale scenarios.

Data transfer speeds: Having software capable of maximizing the ingress/egress capabilities of the storage platform is preferable for engineering-type workloads. It’s common for these workloads to have complex payloads, payloads with several large files (10-500 GB) or millions of small files.
Ingress and egress: Support for ingress upwards of 10 Gbps and egress of 50 Gbps. Furthermore, client and server software that can consume the maximum amount of bandwidth possible up to the maximum amount in ingress/egress possible on client and storage.

Data size/ bandwidth	50 Mbps	100 Mbps	500 Mbps	1 Gbps	5 Gbps	10 Gbps
1 GB	2.7 minutes	1.4 minutes	0.3 minutes	0.1 minutes	0.03 minutes	0.010 minutes
10 GB	27.3 minutes	13.7 minutes	2.7 minutes	1.3 minutes	0.3 minutes	>0.1 minutes
100 GB	4.6 hours	2.3 hours	0.5 hours	0.2 hours	0.05 hours	0.02 hours
1 TB	46.6 hours	23.3 hours	4.7 hours	2.3 hours	0.5 hours	0.2 hours
10 TB	19.4 days	9.7 days	1.9 days	0.9 days	0.2 days	0.1 days

Copy duration calculations based on data size and the bandwidth limit for the environment.

User experience

Users and systems need a way to perform manual and automated storage actions with graphical, command line, or API-initiated experiences.

Graphical user experience: Microsoft Azure Storage Explorer provides Storage Admins the ability to graphically manage storage. It also has storage consumer features for those who don’t have permissions for Administrative actions, and simply need to perform common storage actions like uploading, downloading, etc.
Command line experience: AzCopy provides developers with an easy way to automate common storage actions through CLI or scheduled tasks.
Automated experiences: Both Microsoft Azure Data Factory and AzCopy provide the ability for applications to use Azure Data Lake Gen2 storage as its primary storage source and destination.

Identifying personas

Because a diverse set of personas utilize storage for different purposes, we need to design storage experiences that satisfy the range of business needs. Through the process of development, we identified these custom persona experiences relevant to both storage and data transfer:

Storage Admins: The Storage Admins are Microsoft Azure subscription owners. Within the Azure subscription they create, manage, and maintain all aspects of MDTS: Storage Accounts, Data Factories, Storage Actions Service, and Self-Service Portal. Storage Admins also resolve requests and incidents that are not handled via Self-Service.
Data Owners: The Data Owner personas are those requesting storage who have the authority to create shares and authorize storage. Data Owners also perform the initial steps of creating automated distributions of data to and from private sites. Data Owners are essentially the decision makers of the storage following handoff of a storage account from Storage Admins.
Storage Consumers: At Microsoft, storage consumers represent a broad set of disciplines, from engineers and developers to project managers and marketing professionals. Storage Consumers can use Microsoft Azure Storage Explorer to perform storage actions to and from authorized storage paths (aka Shares). Within the MDTS Self Service Portal, a storage consumer can be given authorization to create distributions. A distribution can automate the transfer of data from a source to one or multiple destinations.

Implementing and enhancing the solution architecture

After considering multiple Microsoft Azure storage types and complimentary Azure Services, the MDTS team chose the following Microsoft Azure services and software as the foundation for offering a storage and data transfer service to Microsoft Engineering Groups.

Microsoft Azure Active Directory: Meets the requirements for authentication and access.
Microsoft Azure Data Lake Gen2: Meets security and performance requirements by providing encryption, OAuth, Hierarical namespace, fine grained authorization to Azure Active Directory entities, and 10+ GB per sec ingress and egress.
Microsoft Azure Storage Explorer: Meets security, performance, and user experience requirements by providing a graphical experience to perform storage administrative tasks and storage consumer tasks without needing a storage account key or role based access (RBAC) on an Azure resource. Azure Storage Explorer also has AzCopy embedded to satisfy performance for complex payloads.
AzCopy: Provides a robust and highly performant command line interface.
Microsoft Azure Data Factory: Meets the requirements for orchestrating and automating data copies between private networks and Azure Data Lake Gen2 storage paths. Azure Data Factory copy activities are equally as performant as AzCopy and satisfy security requriements.

Enabling Storage and Orchestration

As illustrated below, the first MDTS design was comprised entirely of Microsoft Azure Services with no additional investment from us other than people to manage the Microsoft Azure subscription and perform routine requests. MDTS was offered as a commodity service to engineering teams at Microsoft in January 2020. Within a few months we saw a reduction of third-party software and on-premises file server storage, which provided significant savings. This migration also contributed progress towards the company-wide objectives of Internet First and Zero Trust design patterns.

The first design of MDTS provides storage and orchestration using out of the box Microsoft Azure services.

We initially onboarded 35 engineering teams which included 10,000 Microsoft Azure Storage Explorer users (internal and external accounts), and 600 TB per month of Microsoft Azure storage uploads and downloads. By offering the MDTS service, we saved engineering teams from having to run Azure subscriptions themselves and needing to learn the technical details of implementing a modern cloud storage solution.

Creating access control models

As a team, we quickly discovered that having specific repeatable implementation strategies was essential when configuring public facing Microsoft Azure storage. Our initial time investment was in standardizing an access control process which would simplify complexity and ensure a correct security posture before handing off storage to customers. To do this, we constructed onboarding processes for identifying the type of share for which we standardized the implementation steps.

We implemented standard access control models for two types of shares: container shares and sub-shares.

Container share access control model

The container share access control model is used for scenarios where the data owner prefers users to have access to a broad set of data. As illustrated in the graphic below, container shares supply access to the root, or parent, of a folder hierarchy. The container is the parent. Any member of the security group will gain access to the top level. When creating a container share, we also make it possible to convert to a sub-share access control model if desired.

Microsoft Azure Storage Explorer grants access to the root, or parent, of a folder hierarchy using the container share access control model. Both engineering and marketing are containers. Each has a specific Microsoft Azure Active Directory Security group. A top-level Microsoft Azure AD Security group is also added to minimize effort for users who should get access to all containers added to the storage account.

This model fits scenarios where group members get Read, Write, and Execute permissions to an entire container. The authorization allows users to upload, download, create, and/or delete folders and files. Making changes to the Access Control restricts access. For example, to create access permissions for download only, select Read and Execute.

Sub-share access control model

The sub-share access control model is used for scenarios where the data owner prefers users have explicit access to folders only. As illustrated in the graphic below, folders are hierarchically created under the container. In cases where several folders exist, a security group access control can be implemented on a specific folder. Access is granted to the folder where the access control is applied. This prevents users from seeing or navigating folders under the container other than the folders where an explicit access control is applied. When users attempt to browse the container, authorization will fail.

Microsoft Azure Storage Explorer grants access to sub-folder only using the sub-share access control model. Members are added to the sub-share group, not the container group. The sub-share group is nested in the container group with execute permissions to allow for Read and Write on the sub-share.

This model fits for scenarios where group members get Read, Write, and Execute permissions to a sub-folder only. The authorization allows users to upload, download, create folders/files, and delete folders/files. The access control is specific to the folder “project1.” In this model you can have multiple folders under the container, but only provide authorization to a specific folder.

The sub-share process is only applied if a sub-share is needed.

Any folder needing explicit authorization is considered a sub-share.
We apply a sub-share security group access control with Read, Write, and Execute on the folder.
We nest the sub-share security group in the parent share security group used for Execute only. This allows members who do not have access to the container enough authorization to Read, Write, and Execute the specific sub-share folder without having Read or Write permissions to any other folders in the container.

Applying access controls for each type of share (container and or sub-share)

The parent share process is standard for each storage account.

Each storge account has a unique security group. This security group will have access control applied for any containers. This allows data owners to add members and effectively give access to all containers (current and future) by simply changing the membership of one group.
Each container will have a unique security group for Read, Write, and Execute. This security group is used to isolate authorization to a single container.
Each container will have a unique group for execute. This security group is needed in the event sub-shares are created. Sub-shares are folder-specific shares in the hierarchical namespace.
We always use the default access control option. This is a feature that automatically applies the parent permissions to all new child folders (sub-folders).

The first design enabled us to offer MDTS while our engineers defined, designed, and developed an improved experience for all the personas. It quickly became evident that Storage Admins needed the ability to see an aggregate view of all storage actions in near real-time to successfully operate the service. It was important for our administrators to easily discover the most active accounts and which user, service principle, or managed service identity was making storage requests or performing storage actions. In July 2020, we added the Aggregate Storage Actions service.

Adding aggregate storage actions

For our second MDTS design, we augmented the out of the box Microsoft Azure Storage capabilities used in our first design with the capabilities of Microsoft Azure Monitor, Event Hubs, Stream Analytics, Function Apps, and Microsoft Azure Data Explorer to provide aggregate storage actions. Once the Aggregate Storage Actions capability was deployed and configured within MDTS, storage admins were able to aggregate the storage actions of all their storage accounts and see them in a single pane view.

The second design of MDTS introduces aggregate storage actions.

The Microsoft Azure Storage Diagnostic settings in Microsoft Azure Portal makes it possible for us to configure specific settings for blob actions. Combining this feature with other Azure Services and some custom data manipulation gives MDTS the ability to see which users are performing storage actions, what those storage actions are, and when those actions were performed. The data visualizations are near real-time and aggregated across all the storage accounts.

Storage accounts are configured to route logs from Microsoft Azure Monitor to Event Hub. We currently have 45+ storage accounts that generate around five million logs each day. Data filtering, manipulation, and grouping is performed by Stream Analytics. Function Apps are responsible for fetching UPNs using Graph API, then pushing logs to Microsoft Azure Data Explorer. Microsoft Power BI and our modern self-service portal query Microsoft Azure Data Explorer and provide the visualizations, including dashboards with drill down functionality. The data available in our dashboard includes the following information aggregated across all customers (currently 35 storage accounts).

Aggregate view of most active accounts based on log activity.
Aggregate total of GB uploaded and download per storage account.
Top users who uploaded showing the user principal name (both external and internal).
Top users who downloaded showing the user principal name (both external and internal).
Top Accounts uploaded data.
Top Accounts downloaded data.

The only setting required to onboard new storage accounts is to configure them to route logs to the Event Hub. Because we can have an aggregate store of all the storage account activities, we are able to offer MDTS customers an account view into their storage account specific data.

Following the release of Aggregate Storage Actions, the MDTS team, along with feedback from customers, identified another area of investment—the need for storage customers to “self-service” and view account specific insights without having role-based access to the subscription or storage accounts.

Providing a self-service experience

To enhance the experience of the other personas, MDTS is now focused on the creation of a Microsoft Azure web portal where customers can self-service different storage and transfer capabilities without having to provide any Microsoft Azure Role Based Access (RBAC) to the underlying subscription that hosts the MDTS service.

When designing MDTS self-service capabilities we focused on meeting these primary goals:

Make it possible for Microsoft Azure Subscription owners (Storage Admins) to provide the platform and services while not needing to be in the middle of making changes to storage and transfer services.
The ability to create custom persona experiences so customers can achieve their storage and transfer goals through a single portal experience in a secure and intuitive way. Some of the new enterprise scale capabilities include:
- Onboarding.
- Creating storage shares.
- Authorization changes.
- Distributions. Automating the distribution of data from one source to one or multiple destinations.
- Provide insights into storage actions (based on the data provided in Storage Actions enabled in our second MDTS release).
- Reporting basic consumption data, like the number of users, groups, and shares on a particular account.
- Reporting the cost of the account
As Azure services and customer scenarios change, the portal can also change.
If customers want to “self-host” (essentially take our investments and do it themselves), we will easily be able to accommodate.

Our next design of MDTS introduces a self-service portal.

Storage consumer user experiences

After storage is created and configured, data owners can then share steps for storage consumers to start using storage. Upload and download are the most common storage actions, and Microsoft Azure provides software and services needed to perform both actions for manual and automated scenarios.

Microsoft Azure Storage Explorer is recommended for manual scenarios where users can connect and perform high speed uploads and downloads manually. Both Microsoft Azure Data Factory and AzCopy can be used in scenarios where automation is needed. AzCopy is heavily preferred in scenarios where synchronization is required. Microsoft Azure Data Factory doesn’t provide synchronization but does provide robust data copy and data move. Azure Data Factory is also a managed service and better suited in enterprise scenarios where flexible triggering options, uptime, auto scale, monitoring, and metrics are required.

Using Microsoft Azure Storage Explorer for manual storage actions

Developers and Storage Admins are accustomed to using Microsoft Azure Storage Explorer for both storage administration and routine storage actions (e.g., uploading and downloading). Non-storage admin, otherwise known as Storage Consumers, can also use Microsoft Azure Storage Explorer to connect and perform storage actions without needing any role-based access control or access keys to the storage account. Once the storage is authorized, members of authorized groups can perform routine steps to attach the storage they are authorized for, authenticating with their work email, and leveraging the options based on their authorization.

The processes for sign-in and adding a resource via Microsoft Azure Active Directory are found in the Manage Accounts and Open Connect Dialogue options of Microsoft Azure Storage Explorer.

After signing in and selecting the option to add the resource via Microsoft Azure Active Directory, you can supply the storage URL and connect. Once connected, it only requires a few clicks to upload and download data.

Microsoft Azure Storage Explorer Local and Attached module. After following the add resource via Microsoft Azure AD process, the Azure AD group itshowcase-engineering is authorized to Read, Write, and Edit (rwe) and members of the group can perform storage actions.

To learn more about using Microsoft Azure Storage Explorer, Get started with Storage Explorer. There are additional links in the More Information section at the end of this document.

Note: Microsoft Azure Storage Explorer uses AzCopy. Having AzCopy as the transport allows storage consumers to benefit from high-speed transfers. If desired, AzCopy can be used as a stand-alone command line application.

Using AzCopy for manual or automated storage actions

AzCopy is a command line interface used to perform storage actions on authorized paths. AzCopy is used in Microsoft Azure Storage Explorer but can also be used as a standalone executable to automate storage actions. It’s a multi-stream TCP based transport capable of optimizing throughput based on the bandwidth available. MDTS customers use AzCopy in scenarios which require synchronization or cases where Microsoft Azure Storage Explorer or Microsoft Azure Data Factory copy activity doesn’t meet the requirements for data transfer. For more information about using AzCopy please see the More Information section at the end of this document.

AzCopy is a great match for standalone and synchronization scenarios. It also has options that are useful when seeking to automate or build applications. Because AzCopy is a single executable running on either a single client or server system, it isn’t always ideal for enterprise scenarios. Microsoft Azure Data Factory is a more robust Microsoft Azure service that meets the most enterprise needs.

Using Microsoft Azure Data Factory for automated copy activity

Some of the teams that use MDTS require the ability to orchestrate and operationalize storage uploads and downloads. Before MDTS, we would have either built a custom service or licensed a third-party solution, which can be expensive and/or time consuming.

Microsoft Azure Data Factory, a cloud-based ETL and data integration service, allows us to create data-driven workflows for orchestrating data movement. Including Azure Data Factory in our storage hosting service model provided customers with a way to automate data copy activities. MDTS’s most common data movement scenarios are distributing builds from a single source to multiple destinations (3-5 destinations are common).

Another requirement for MSTS was to leverage private data stores as a source or destination. Microsoft Azure Data Factory provides the capability to use a private system, also known as a self-hosted integration runtime. When configured this system can be used in copy activity communicating with on-premises file systems. The on-premises file system can then be used as a source and/or destination datastore.

In the situation where on-premises file system data needs to be stored in Microsoft Azure or shared with external partners, Microsoft Azure Data Factory provides the ability to orchestrate pipelines which perform one or multiple copy activities in sequence. These activities result in end-to-end data movement from one on-premises file systems to Microsoft Azure Storage, and then to another private system if desired.

The graphic below provides an example of a pipeline orchestrated to copy builds from a single source to several private destinations.

Microsoft Azure Data Factory pipeline example. Private site 1 is the build system source. Build system will build, load the source file system, then trigger the Microsoft Azure Data Factory pipeline. Build is then uploaded, then Private sites 2, 3, 4 will download. Function apps are used for sending email notifications to site owners and additional validation.

For more information on Azure Data Factory, please see Introduction to Microsoft Azure Data Factory. There are additional links in the More Information section at the end of this document.

If you are thinking about using Microsoft Azure to develop a modern data transfer and storage solution for your organization, here are some of the best practices we gathered while developing MDTS.

Close the technical gap for storage consumers with a white glove approach to onboarding

Be prepared to spend time with customers who are initially overwhelmed with using Azure Storage Explorer or AzCopy. At Microsoft, storage consumers represent a broad set of disciplines—from engineers and developers to project managers and marketing professionals. Azure Storage Explorer provides an excellent experience for engineers and developers but can be a little challenging for less technical roles.

Have a standard access control model

Use Microsoft Azure Active Directory security groups and group nesting to manage authorization; Microsoft Azure Data Lake Gen2 storage has a limit to the number of Access Controls you can apply. To avoid reaching this limit, and to simplify administration, we recommend using Microsoft Azure Active Directory security groups. We apply the access control to the security group only, and in some cases, we nest other security groups within the access control group. We nest Member Security Groups within Access Control Security Groups to manage access. These group types don’t exist in Microsoft Azure Active Directory but do exist within our MDTS service as a process to differentiate the purpose of a group. We can easily determine this differentiation by the name of the group.

Access Control Security Groups: We use this group type for applying Access Control on ADLS Gen2 storage containers and/or folders.
Member Security Groups: We use these to satisfy cases where access to containers and/or folders will constantly change for members.

When there are large numbers of members, nesting prevents the need to add members individually to the Access Control Security Groups. When access is no longer needed, we can remove the Member Group(s) from the Access Control Security Group and no further action is needed on storage objects.

Along with using Microsoft Azure Active Directory security groups, make sure to have a documented process for applying access controls. Be consistent and have a way of tracking where access controls are applied.

Use descriptive display names for your Microsoft Azure AD security groups

Because Microsoft Azure AD doesn’t currently organize groups by owners, we recommend using naming conventions that capture the group’s purpose and type to allow for easier searches.

Example 1: mdts-ac-storageacct1-rwe. This group name uses our service standard naming convention for Access Control group type on Storage Account 1, with access control Read, Write, and Execute. mdts = Service, ac = Access Control Type, storageacct1 = ADLS Gen2 Storage Account Name, rwe = permission of the access control.
Example 2: mdts-mg-storageacct1-project1. This group name uses our service standard naming convention for Member Group type on Storage Account 1. This group does not have an explicit access control on storage, but it is nested in mdts-ac-storageacct1-rwe where any member of this group has the Read, Write, and Execute access to storage account1 because it’s nested in mdts-ac-storageacct1-rwe.

Remember to propagate any changes to access controls

Microsoft Azure Data Lake Gen2 storage, by default, doesn’t automatically propagate any access control changes. As such, when removing, adding, or changing an access control, you need to follow an additional step to propagate the access control list. This option is available in Microsoft Azure Storage Explorer.

Storage Consumers can attempt Administrative options

Storage Consumers use Microsoft Azure Storage Explorer and are authenticated with their Microsoft Azure Active Directory user profile. Since Azure Storage Explorer is primarily developed for Storage Admin and Developer personas, all administrative actions are visible. It is common for storage consumers to attempt administrative actions, like managing access or deleting a container. Those actions will fail due to only being accessed via access control lists (ACLs). There isn’t a way to provide administration actions via ACL’s. If administrative actions are needed, then users will become a Storage Admin which has access via Azure’s Role Based Access Control (RBAC).

Microsoft Azure Storage Explorer and AzCopy are throughput intensive

As stated above, AzCopy is leveraged by Microsoft Azure Storage Explorer for transport actions. When using Azure Storage Explorer or AzCopy it’s important to understand that transfer performance is its specialty. Because of this, some clients and/or networks may benefit from throttling AzCopy’s performance. In circumstances where you don’t want AzCopy to consume too much network bandwidth, there are configurations available. In Microsoft Azure Storage Explorer use the Settings option and select the Transfers section to configure Network Concurrency and/or File Concurrency. In the Network Concurrency section, Adjust Dynamically is a default option. For AzCopy, there are flags and environment variables available to optimize performance.

For more information, visit Configure, optimize, and troubleshoot AzCopy.

Microsoft Azure Storage Explorer sign-in with MSAL

Microsoft Authentication Library, currently in product preview, provides enhanced single sign-on, multi-factor authentication, and conditional access support. In some situations, users won’t authenticate unless MSAL is selected. To enable MSAL, select the Setting option from Microsoft Azure Storage Explorer’s navigation pane. Then in the application section, select the option to enable Microsoft Authentication Library.

B2B invites are needed for external accounts (guest user access)

When there is a Microsoft business need to work with external partners, leveraging guest user access in Microsoft Azure Active Directory is necessary. Once the B2B invite process is followed, external accounts can be authorized by managing group membership. For more information, read What is B2B collaboration in Azure Active Directory?

We used Microsoft Azure products and services to create an end-to-end modern data transfer and storage service that can be used by any group at Microsoft that desires cloud data storage. The release of Microsoft Azure Data Lake Gen 2, Microsoft Azure Data Factory, and the improvements in the latest release of Azure Storage Explorer made it possible for us to offer MDTS as a fully native Microsoft Azure service.

One of the many strengths of using Microsoft Azure is the ability to use only what we needed, as we needed it. For MDTS, we started by simply creating storage accounts, requesting Microsoft Azure Active Directory Security Groups, applying an access control to storage URLs, and releasing the storage to customers for use. We then invested in adding storage actions and developed self-service capabilities that make MDTS a true enterprise-scale solution for data transfer and storage in the cloud.

We are actively encouraging the adoption of our MDTS storage design to all Microsoft engineering teams that still rely on legacy storage hosted in the Microsoft Corporate network. We are also encouraging any Microsoft Azure consumers to consider this design when evaluating options for storage and file sharing scenarios. Our design has proven to be scalable, compliant, and performant with the Microsoft Zero Trust security initiative, handling extreme payloads with high throughput and no constraints on the size or number of files.

By eliminating our dependency on third-party software, we have been able to eliminate third-party licensing, consulting, and hosting costs for many on-premises storage systems.

Are you ready to learn more? Sign up for your own Microsoft Azure subscription and get started today.

To receive the latest updates on Azure storage products and features to meet your cloud investment needs, visit Microsoft Azure updates.

The post Providing modern data transfer and storage service at Microsoft with Microsoft Azure appeared first on Inside Track Blog.

Using Azure Multi-Factor Authentication at Microsoft to enhance security

Inside Track staff — Thu, 01 Dec 2022 21:45:52 +0000

To address the increasing security risk of phishing emails and fake web pages that are designed to harvest user names and passwords, Microsoft Digital Employee Experience (MDEE) accelerated the adoption of Microsoft Azure Multi-Factor Authentication for all users at Microsoft. We already had multi-factor authentication for remote access and virtual private network (VPN), in the form of virtual and physical smart cards. But to improve security and better support mobile productivity, we needed an option that provided:

Additional security for federated identities that are used to access on-premises resources and cloud-based services.
Multi-factor authentication capability for approximately 190,000 users and over 300,000 mobile devices that are not set up to use smart cards, or when a user does not have their smart card.

For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=zrAlq0BpNao, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Learn how Microsoft moved most of its applications off the corporate network and to the internet using Microsoft Azure and Microsoft Office 365.

Integrating on-premises identities

To enable a single user identity for authentication and a unified experience when accessing resources in the cloud and on-premises, we integrated our on-premises Active Directory forests with Microsoft Azure Active Directory (Azure AD). We use Microsoft Azure AD Connect and Active Directory Federation Services (AD FS), so when an Azure-based application needs user attributes—for example, their location, organization, or job title—that information is available as long as the service has the right permissions to query for those attributes.

Setting up Microsoft Azure Multi-Factor Authentication

To further secure user identities, we enabled Microsoft Azure Multi-Factor Authentication as an additional verification method that is sent to the user. Our verification options include a phone call or mobile app notification, and the user can select the preferred option at the time of enrollment. The user experience is based on the connectivity type, so when the user connects remotely they are prompted for a second verification. For critical services, we require multi-factor authentication for access, even for connections within the corporate network.

Enrolling users

For our sign-in experience, we have enabled signing in with mobile phone and signing in with mobile app notification. Our corporate policies required that the user identity be validated for enrollment. The enrollment process used a portal designed to enable users to validate their phone number automatically when they sign in with a smart card during registration. The identity of users without smart cards is validated using a workflow that requires their manager’s approval.

Customizing a user-friendly sign-in screen

We enabled Microsoft Azure Multi-Factor Authentication in a phased manner, and gathered feedback from our early adopters through Yammer communities and from user support about the experience. Based on feedback, we chose to customize the AD FS sign-in page to create an intuitive and self-guided user experience before deploying to the rest of our users. Early feedback helped us improve the experience where additional clarity was needed to guide users, such as which sign-in option to select or which certificate to choose for authentication.

We were able to combine sign-in screens and update them to drive preferred user behavior. While our default option for the user is to sign in with a smart card, in cases where username and password was selected we achieved the strong authentication with Microsoft Azure Multi-Factor Authentication. The interface is also updated to detect and display only valid authentication options. If a user is on a device that can only support phone or app verification, the user will not see physical smart card as their primary option to sign in. The option to sign on with a username and password using phone verification as the second factor is available on the screen but is a smaller option that needs to be purposefully selected.

More self-help options are provided in the sign-in failure screen to guide users through the steps to resolve their issue. If none of the steps resolve the user’s issue, then they are given contact information for the global helpdesk.

Learn more about customizing AD FS sign-in pages at Customizing the AD FS Sign-in Pages.

Additional scenarios

We enabled a few additional scenarios to help improve the experience for our users, reduce the amount of helpdesk calls, and improve the performance of the service.

Securing remote access

For remote access, our VPN infrastructure has long required a physical or virtual smart card to sign in securely. With the addition of Microsoft Azure Multi-Factor Authentication, we can integrate with our existing VPN/RADIUS infrastructure and users can also use phone or mobile app verification to sign in. This includes making this option available within our Connection Manager—a component of Microsoft Windows Server-based VPN client. We were able to give users the choice of using their preferred strong authentication method for remote access. This has helped users gain remote access faster in locations where it takes a long time to get smart cards.

Changing passwords

Microsoft users can now change their passwords using an internal, cloud-based, self-service password management solution. We have integrated Azure Multi-Factor Authentication, including verification with a phone call or mobile app, as part of the process. Users are prompted to answer additional verification questions when they change a password. When users need to change their password, they can now do so without having to call the global helpdesk.

Enabling performance and high availability

Our Microsoft Azure Multi-Factor Authentication servers are configured with Windows Server 2012 R2 AD FS. To provide high availability and redundancy, we do not direct authentication traffic to the primary Multi-Factor Authentication server. This helps ensure that the server can make updates without having performance issues.

Our distributed secondary Multi-Factor Authentication servers store a read-only copy of the Multi-Factor Authentication configuration database from the primary server. They connect to and synchronize data with the primary server, provide fault tolerance, and load balance access requests. Since Azure Multi-Factor Authentication Server is not running on the same servers as AD FS, we have installed the Multi-Factor Authentication adapter for AD FS locally on servers running AD FS. Each virtual adapter is configured for certificate authentication to the web service SDK on the Multi-Factor Authentication server.

For load balancing, we use a combination of DNS round robin and hardware. To learn more about steps to install Azure Multi-Factor Authentication Server see Getting started with the Microsoft Azure Multi-Factor Authentication Server.

Monitoring service health

To monitor service health and performance, we developed a synthetic client flow through the Multi-Factor Authentication and AD FS infrastructures. We used a test account without any rights to live applications or resources on the corporate network to run synthetic transactions that tested the end-to-end client flow. Using a constant stream of synthetic transactions per day, we can quickly identify degradations in the service and get them fixed before users are affected. We use Azure AD Connect Health for detailed monitoring, reporting, and alerts for our AD FS servers. Azure AD Connect Health, a feature of Azure Active Directory Premium helps monitor and secure cloud and on-premises identity infrastructure. Using Azure AD Connect Health with AD FS has more information about Azure AD Connect Health.

We also use real-time metrics in Visual Studio Application Insights to analyze request load, server performance counters, and response times across dependencies. It helps us diagnose exceptions and failed requests, correlate them with events and traces, and get multi-dimensional analyses over standard metrics. Visual Studio Application Insights helps us determine the cause of any performance behavior through ad hoc queries. Learn more about how to detect, triage, and diagnose issues in your web apps and services with Visual Studio Application Insights.

The fraud alert feature has been configured and set up so that our users can report fraudulent attempts when trying to access resources. When a fraudulent attempt is reported, we have the ability to investigate and take immediate action, such as locking out an account, which is included in our service reports.

We consolidated the text-based logging from the Multi-Factor Authentication servers into a single database, and the support team uses scripts to query against those reports. We also created Microsoft SQL Server Reporting Services reports for the support team to look at bigger issues.

Reporting provides a view into what kinds of issues are occurring and what can be done to provide a better user experience. Reports were used by service management to help them spot trends during the rollout. After the rollout, service managers monitored the user experience service health reports for telemetry about how many people were using the phone call for phone authentication, and how many people were using the mobile application. The service health report also lets service managers know how many users have authenticated on a given day and how the service is performing.

Conditional access control

We have the ability to provide conditional access to applications with AD FS rules. We can vary the granularity of how we enforce multi-factor authentication, at an application level. AD FS is flexible and allows us to designate people or groups that can access an application, and how they have to authenticate when they are on the corporate network or off. Most users that access applications on the corporate network are allowed single authentication, and applications that are accessed from the internet require multi-factor authentication. Some applications are so important that they require multi-factor authentication even when the user accesses them from the corporate network.

Improve manageability. Common reporting with other services and integration into a reporting dashboard provides an end-to-end view of all services.
Focus on the user experience. This is particularly important when it comes to security initiatives and supporting broad adoption. Leadership and users can be resistant to change if there is a perception that new security measures may degrade the user experience.
Use synthetic transactions to regularly test the performance of your Azure Multi-Factor Authentication environment. This will help identify any degradations in the service early enough to address them before users are affected.
Create test accounts for synthetic transactions. By using accounts that do not have access to live applications or resources, you can help keep information secure while testing service performance of your environment.
Communicate broadly about upcoming changes and make them as minimally intrusive as possible for users. User awareness and readiness is key to change management. We used a combination of social channels on Yammer, an email campaign, print collateral, posters, and digital signage.

The post Using Azure Multi-Factor Authentication at Microsoft to enhance security appeared first on Inside Track Blog.

How Microsoft used SQL Azure and Azure Service Fabric to rebuild a key internal app

Lukas Velush — Wed, 16 Oct 2019 20:23:57 +0000

When Raja Narayan took over supporting the Payee Management Application that Microsoft Finance uses to onboard new suppliers and partners, the experience was broken.

“Our application’s infrastructure was on-premises,” Narayan says. “It was a big, old-school architecture monolith and, although we had database-based logging in place, there was no alerting setup at any level. Bugs and infrastructure failures were bringing the application down, but we didn’t know when this happened.”

And it went down a lot.

When it did, the team wouldn’t know until a user filed a ticket. Then it would take four to six hours before the ticket reached Narayan’s team.

“We would undertake root-cause investigation and it sometimes could take a solid two to three hours, if not more in rare cases, until we managed to eventually identify and resolve the problem,” says Narayan, a principal software engineer on the Microsoft Digital group that supports the Microsoft Finance team.

[Take a look at how Narayan’s broader team is modernizing applications.]

All told, it would take at least 10 to 12 hours to bring the system back online.

And it wasn’t only the reliability challenges that the team was hit with daily. Updates and fixes required taking the system down. Engineering teams didn’t have insight into work that other teams were doing. Cross-discipline collaboration was minimal. Continuous repetitive manual work was required. And telemetry was severely limited.

“There was no reliability at all,” Narayan says. “The user experience was very, very bad.”

That was four years ago, before the team moved its payee management system and its 95,000 active supplier and partner accounts to the cloud.

“When I joined our team, it was obvious that we needed a change. And going to Azure was a big part of it,” Narayan says. “Going to the cloud was going to open up new opportunities for us.”

He was right. After the nine-month migration was finished, things got much better right away. The benefits included:

The team was empowered to adopt modern, DevOps engineering practices, something they really wanted. The benefits showed up in many ways, including reduced cross-team friction and faster response times.
Failures were reported to a Directly Responsible Individual (DRI) immediately. They would fix the problem right away or queue it up for the engineering team to do deeper-level work.
The time to fix major production issues dropped to as few as 15 minutes, and a maximum of four hours.
The team no longer needed to shut down the system to make production fixes (thanks to the availability of staging and production slots, and hosting frameworks like Azure Service Fabric).
Application reliability shot up from around 95 percent to 99 percent. Availability stayed high because of redundancy.
Scaling the application up and out became just a configuration away. The team was able to scale the services based on memory and processor utilization.
The application’s telemetry data became instantly available to analyze and learn from.
The team could start taking advantage of automation and governance capabilities.

The shift to Azure is having a lasting impact.

“If someone asked me to go back, I don’t think I could happily do it,” Narayan says. “I don’t know how we survived in those old days. It’s so much faster and more powerful to be on Azure.”

Instead of spending all his time fighting to reduce technical debt, on building and maintaining too many services, and buying and installing technical infrastructure, he’s now focused on what his internal business customers need.

“Now we’re building a program,” Narayan says. “Now we’re taking care of our customers. Application-hosting infrastructure is not our concern now. Azure takes care of it.”

Opening doors with SQL Azure

Moving to the cloud also meant the team got to move on from an on-premises SQL Server database that needed continuous investment in optimization and maintenance to avoid problems with performance.

“We’ve never had an incident where our SQL Azure database has gone down,” Narayan says. “When we were on-prem, our work was often interrupted by accidental server restarts and patch installations.”

The team no longer needs to shut the application down and reboot the server when it wants to fix something or make an upgrade. “Every time we want to do something new, we make a couple of clicks, and boom, we’re done,” he says.

Azure SQL made it much easier to scale up and down when user loads changed. “My resources are so elastic now,” Narayan says. “I can shrink and expand based on my need—it’s a matter of sliding the scrollbar.”

Moving the application’s database to SQL Azure has given the team access to several new tools.

“With our move to cloud, the team can experiment on any databases, something that wasn’t possible before,” Narayan says. “Before we could only use SQL Server. Now we have an array of options such as Cosmos DB, table storage, MySQL, and PostgreSQL. New features from these products are available automatically to us. We don’t have to install feature updates and patches—it’s all managed by Azure.”

Living in the cloud also gives the team new access to the application’s data.

“We now live in this new big-data world,” Narayan says. “We can now get a lot of insights about our application, especially with machine learning and AI.”

For example, SQL Azure learns from the incoming load and accordingly tunes itself. Indexes are created or dropped based on how it learns. “This is one of the most sought-after features by our team,” he says. “This feature does what a database administrator used to have to do by hand.”

And processing the many tiny transactions that come through Narayan’s application? Those all happen much faster now as well.

“For Online Analytic Processing (OLAP), we need big processing machines,” he says. “We need big resources.”

Azure provides him with choices, including Azure Datawarehouse, Azure Databricks, and Azure HDInsights. “If I was still on-prem, this kind of data processing would just be a dream for me,” he says. “Now they are a click away for me.”

Going forward, the plan is to use AI and machine learning to analyze Payee Management Application’s data at greater depth. “There is a lot more we can do with our data,” Narayan says. “We’re just getting started.”

Narayan’s journey toward more reliable and agile service is a typical example of how off-loading the work of managing complex on-premises infrastructure can help the company’s internal and external customers focus on their core businesses, says Eli Birova, a site-reliability engineer on the Azure SQL SRE Team.

“And one of the biggest values Azure SQL DB brings is a database in the Azure cloud that scales in and out together with your business need and adapts to your workload,” Birova says.

That provides customers like Narayan and his team with a database as a service tailored by the deep Relational Database Management Systems (RDBMS) engineering expertise that come from long years of developing Microsoft SQL Server, she says. It’s a service that incorporates large-scale distributed systems design and implementation best practices, which also natively leverages the scalability and resiliency mechanisms of the Azure stack itself.

“We in the Azure SQL DB team are continuously monitoring and analyzing the behavior of our services and the experience our customers have with us,” Birova says. “We’re very focused on identifying and implementing improvements to our feature set, reliability, and performance. We want to make sure that every customer can rely on their data when and as they need it, and that they can count on their server being up to date and secure without needing to invest their own engineering resources into managing on-premises database infrastructure.”

Harnessing the power of Azure Service Fabric

Once Narayan’s team finished migrating the Payee Management Application to the cloud, it got the breathing room it needed to start thinking bigger.

“We started asking ourselves, ‘How can we get more out of being in the cloud?’” Narayan says. “It didn’t take us long to realize that the best way to take advantage of everything Azure had to offer would be to modify our application from the ground up to be cloud-native.”

That shift in thinking meant that his days of running a massive, clunky, monolithic application were numbered.

“We realized we could use Azure Service Fabric to rebuild the application as a suite of microservices,” Narayan says. “We could get an entirely fresh start.”

Azure Service Fabric is part of an evolving set of tools that the Azure product group is using to help customers—including power users inside Microsoft—build and operate always-on, scalable, distributed apps like the one Narayan’s team manages. So says Spencer Schwab, a software engineering manager on the Microsoft Azure Site Reliability Engineering (SRE) team.

“We’re learning from the experience Raja and his team are having with Service Fabric,” Schwab says. “We’re pumping those learnings back into the product so that our customers have the best experience possible when they choose to bet their businesses on us.”

Narayan’s team is using Azure Service Fabric to gradually rebuild the Payee Management Application without interrupting service to customers. That’s something possible only in the cloud.

“We lifted and shifted all of the old, existing monolith components into Azure Service Fabric,” he says. “Containerizing it like that has allowed us to gradually strangle the older application.”

Each component of the old application is docked in a container. Each is purposefully placed next to the microservice that will replace it.

“Putting each microservice next to the component that it’s replacing allows us to smoothly move that bit of workload to the new microservice without shutting down the larger application,” Narayan says. “This is making our journey to microservices pleasant.”

The team is halfway finished.

“So far we have 12 microservices, and we’re planning to expand up to 25,” he says.

Once the team is done, the team can then truly take advantage of being in the cloud.

“We’ll be ready to reap the benefits of cloud-native development,” Narayan says. “Anything becomes possible at that point.”

The post How Microsoft used SQL Azure and Azure Service Fabric to rebuild a key internal app appeared first on Inside Track Blog.