security Archives - Inside Track Blog http://approjects.co.za/?big=insidetrack/blog/tag/security/ How Microsoft does IT Tue, 24 Sep 2024 20:23:31 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 137088546 Harnessing first-party patching technology to drive innovation at Microsoft http://approjects.co.za/?big=insidetrack/blog/harnessing-first-party-patching-technology-to-drive-innovation-at-microsoft/ Mon, 16 Sep 2024 15:00:45 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=11209 We live in a world where network security is a foundational concern for large enterprises like ours that are trusted with sensitive customer data. This creates an environment where we all need to ensure that we have high patching compliance across our massive array of devices. This complexity requires that we continuously improve our patching […]

The post Harnessing first-party patching technology to drive innovation at Microsoft appeared first on Inside Track Blog.

]]>
Microsoft Digital storiesWe live in a world where network security is a foundational concern for large enterprises like ours that are trusted with sensitive customer data. This creates an environment where we all need to ensure that we have high patching compliance across our massive array of devices. This complexity requires that we continuously improve our patching tools and solutions.

Layered on top of that, our need for device security exists within a complex matrix of software, hardware, and user interfaces. If our employees are running out-of-date software, they’re leaving their device and our network unsecured and vulnerable.

Every leader understands the extreme importance of keeping their data secure. No enterprise wants to be the next company that gets exposed by one of these hacks that has happened in the past and to lose sensitive business or customer data.

—Biswa Jaysingh, principal product manager, Microsoft Digital Employee Experience

Ruana, Jaysingh, and Damkewala pose for portraits in a montage of three images.
Christine Ruana (left), Biswa Jaysingh (center), and Jamshed Damkewala are among those helping Microsoft transform how it does first-party patching. Ruana is principal program manager for Microsoft Visual Studio responsible for enterprise deployments and updates of Visual Studio, Jaysingh is a principal product manager on our Microsoft Digital Employee Experience team, and Damkewala is a principal PM manager on the Platforms and Languages team responsible for .NET.

This is especially true when developers use powerful first-party tools like Microsoft Visual Studio and developer platforms like .NET to build new software. With developer platforms like .NET, this becomes even more critical because .NET is not just deployed to developer machines, it is also installed on the computers where the developed application will run.

Here at Microsoft Digital Employee Experience, the organization that powers, protects, and transforms the company, we are committed to holistically improving patching compliance rates across the company. To ensure we are improving security at every level of Microsoft’s infrastructure, from software and devices to the networks themselves, we are utilizing new technology and new approaches that we develop internally within our organization and within our product group partners.

“Every leader understands the extreme importance of keeping their data secure,” says Biswa Jaysingh, a principal product manager with Microsoft Digital Employee Experience. “No enterprise wants to be the next company that gets exposed by one of these hacks that has happened in the past and to lose sensitive business or customer data.”

Recent innovations in first-party patching technology at Microsoft, including in Windows Update for Business, Microsoft Endpoint Manager, and Microsoft Defender for Endpoints, are allowing us to unlock unprecedented levels of security across our network while at the same time reducing costs and speeding the timeline of deployment. From consolidating multiple deployments to reducing the impact of reboots on users, our changes are producing efficiencies across the business.

Within the matrix of network security at Microsoft, there are several critical arenas for security admins to monitor, patch, and secure. Malicious actors are looking at the full tech stack for vulnerabilities, which means our teams must monitor, patch, and secure devices at every level from the operating system and first-party software to hardware and third-party software.

[Discover boosting Windows internally at Microsoft with a transformed approach to patching.]

Reacting to the growing threat to first-party software

In the modern cloud-connected world there is more surface area that we need our IT professionals to protect. With more and more devices, from Internet of Things devices to peripherals having internet access, there is much larger potential for bad actors to break in. It’s more important than ever to stay secure, which means update compliance must be as close to 100 percent as possible across all levels of a device.

“The last thing we want is for Microsoft to ship a fix for a vulnerability, but an enterprise isn’t able to adopt the update. That would leave them insecure,” says Christina Ruana, principal program manager for Microsoft Visual Studio who is responsible for enterprise deployments and updates of Visual Studio.

This passion for effectively securing networks led Microsoft leaders like Ruana to ensure they’re doing everything possible to ease the burden of patching on our teams here at Microsoft and for our external customers. “Visual Studio’s recent Administrator update solution makes it much easier for enterprises to deploy updates through Microsoft Endpoint Manager,” Ruana says.

At the start of the .NET journey we were seeing unacceptable compliance rates as developers were using the software in ways that we hadn’t anticipated. This increased the complexity for maintaining patching compliance. We had to create paths for updating both current builds of .NET through Visual Studio and for keeping older builds compliant through Microsoft Update. This has improved compliance rates considerably.

—Jamshed Damkewala, principal PM manager, Platforms and Languages team

We’re using Microsoft Defender for Endpoints to manage the health of our devices, which is helping us improve the security of our network while also improving the user experience for our employees and our admins. Every efficiency gained along the way makes it more likely for compliance rates to grow. Teams are working around the clock to identify and patch vulnerabilities, but this work is only as effective as the compliance rate is strong.

A better experience for admins and users alike

We in the Microsoft Digital Employee Experience organization began our journey to transform the way we do patching by making it easier for our IT admins to deploy patches across our network.

Until recently, the first-party patching regime at Microsoft required a slew of software solutions to be manually managed, including important software applications like Visual Studio and .NET. But in November 2022, we were able to migrate numerous critical patch deployments to Windows Update for Business, dramatically increasing the timeliness and accuracy of device patching.

“At the start of the .NET journey we were seeing unacceptable compliance rates as developers were using the software in ways that we hadn’t anticipated,” says Jamshed Damkewala, principal PM manager on the Platforms and Languages team responsible for .NET. “This increased the complexity for maintaining patching compliance. We had to create paths for updating both current builds of .NET through Visual Studio and for keeping older builds compliant through Microsoft Update. This has improved compliance rates considerably.”

We gain significant efficiencies as we eliminate manual deployments through automation and streamline the rollout of patches through Windows Update and Windows Update for Business. With these universal sources for patches, we simultaneously reduce time for testing while reducing errors in the deployments.

With more accurate updates meeting user devices more quickly and hitting all builds of first-party software that require patching, our networks are more secure than ever. The ease of patches deploying on devices also reduces the impact on users, so they are more likely to remain compliant while experiencing minimal disruption.

These innovations are not custom built for Microsoft. We are effectively leveraging technology that we already had to make it more efficient and effective for teams to patch their software.

—Harshitha Digumarthi, senior product manager responsible, Microsoft Digital Employee Experience

Furthermore, the technology within Microsoft Defender for Endpoints allows for thorough device scanning to provide effective telemetry for admins to react to, giving them better knowledge to engineer future patches and policies for Windows Update for Business, which further grows compliance rates. We use it to scan and report vulnerabilities, which empowers our admins to respond faster. Microsoft Endpoint Manager also allows our admins to better manage Windows Update for Business policies.

Providing the tools for teams to succeed

Internally here at Microsoft, our updated technology allows us to monitor our networks more efficiently, providing detailed telemetry about device health that we’ve never had before. This visibility allows us to develop new protocols for our networks, including complicated cases of end-of-life devices and end-of-service software.

But the true unlock-for-efficiency comes in how these systems were designed, constructed, and automated.

“These innovations are not custom built for Microsoft,” says Harshitha Digumarthi, a senior product manager responsible for improving the patching experience at Microsoft Digital Employee Experience. “We are effectively leveraging technology that we already had to make it more efficient and effective for teams to patch their software.”

This approach reduces cost, increases the speed of development, and fundamentally improves the efficiencies of teams deploying mission-critical patches for their software. Potential errors caused by manual deployment are eliminated and the single update source on a single day per month improves the user experience considerably. The result is a more secure network through increased device compliance.

These benefits are compounded when it comes to first-party software like Visual Studio and .NET. We’ve seen a rise in patching compliance for internal customers developing new solutions with these products, all attributable to improvements in Visual Studio and .NET. As a result, security dividends can exponentially grow through the company and to the ecosystem at large. Our networks, and yours, are more secure thanks to these developments.

Key Takeaways

  • Ensure your software applications are kept up to date to remain secure. Follow this guidance for Visual Studio.
  • By utilizing a common deployment solution in Windows Update for Business and Microsoft Endpoint Manager, efficiency is gained and potential errors from manual updating are mitigated.
  • A single update source on a single day per month dramatically improves the user experience.
  • Innovations in device scanning provides new telemetry, which leads to new solutions for rare-but-important use cases like end-of-life devices and end-of-service software.

Related links

The post Harnessing first-party patching technology to drive innovation at Microsoft appeared first on Inside Track Blog.

]]>
11209
Verifying device health at Microsoft with Zero Trust http://approjects.co.za/?big=insidetrack/blog/verifying-device-health-at-microsoft-with-zero-trust/ Fri, 06 Sep 2024 13:51:32 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=9002 Here at Microsoft, we’re using our Zero Trust security model to help us transform the way we verify device health across all devices that access company resources. Zero Trust supplies an integrated security philosophy and end-to-end strategy that informs how our company protects its customers, data, employees, and business in an increasingly complex and dynamic […]

The post Verifying device health at Microsoft with Zero Trust appeared first on Inside Track Blog.

]]>
Microsoft Digital technical storiesHere at Microsoft, we’re using our Zero Trust security model to help us transform the way we verify device health across all devices that access company resources. Zero Trust supplies an integrated security philosophy and end-to-end strategy that informs how our company protects its customers, data, employees, and business in an increasingly complex and dynamic digital world.

Verified device health is a core pillar of our Microsoft Digital Zero Trust security model. Because unmanaged devices are an easy entry point for bad actors, ensuring that only healthy devices can access corporate applications and data is vital for enterprise security. As a fundamental part of our Zero Trust implementation, we require all user devices accessing corporate resources to be enrolled in device-management systems.

Verified devices support our broader framework for Zero Trust, alongside the other pillars of verified identity, verified access, and verified services.

Diagram showing the four pillars of Microsoft’s Zero Trust model: verify identity, verify device, verify access, and verify services.
The four pillars of Microsoft’s Zero Trust model.

[Explore verifying identity in a Zero Trust model. | Unpack implementing a Zero Trust security model at Microsoft. | Discover enabling remote work: Our remote infrastructure design and Zero Trust. | Watch our Enabling remote work infrastructure design using Zero Trust video.]

Verifying the device landscape at Microsoft

The device landscape at Microsoft is characterized by a wide variety of devices. We have more than 220,000 employees and additional vendors and partners, most of whom use multiple devices to connect to our corporate network. We have more than 650,000 unique devices enrolled in our device-management platforms, including devices running Windows, iOS, Android, and macOS. Our employees need to work from anywhere, including customer sites, cafes, and home offices. The transient nature of employee mobility poses challenges to data safety. To combat this, we are implementing device-management functionality to enable the mobile-employee experience—confirming identity and access while ensuring that the devices that access our corporate resources are in a verified healthy state according to the policies that govern safe access to Microsoft data.

Enforcing client device health

Device management is mandatory for any device accessing our corporate data. The Microsoft Endpoint Manager platform enables us to enroll devices, bring them to a managed state, monitor the devices’ health, and enforce compliance against a set of health policies before granting access to any corporate resources. Our device health policies verify all significant aspects of device state, including encryption, antimalware, minimum OS version, hardware configuration, and more. Microsoft Endpoint Manager also supports internet-based device enrollment, which is a requirement for the internet-first network focus in the Zero Trust model.

We’re using Microsoft Endpoint Manager to enforce health compliance across the various health signals and across multiple client device operating systems. Validating client device health is not a onetime process. Our policy-verification processes confirm device health each time a device tries to access corporate resources, much in the same way that we confirm the other pillars, including identity, access, and services. We’re using modern endpoint protection configuration on every managed device, including preboot and postboot protection and cross-platform coverage. Our modern management environment includes several critical components:

  • Microsoft Azure Active Directory (Azure AD) for core identity and access functionality in Microsoft Intune and the other cloud-based components of our modern management model, including Microsoft Office 365, Microsoft Dynamics 365, and many other Microsoft cloud offerings.
  • Microsoft Intune for policy-based configuration management, application control, and conditional-access management.
  • Clearly defined mobile device management (MDM) policy. Policy-based configuration is the primary method for ensuring that devices have the appropriate settings to help keep the enterprise secure and enable productivity-enhancement features.
  • Windows Update for Business is configured as the default for operating system and application updates for our modern-managed devices.
  • Microsoft Defender for Endpoint (MDE) is configured to protect our devices, send compliance data to Azure AD Conditional Access, and supply event data to our security teams.
  • Dynamic device and user targeting for MDM enables us to supply a more flexible and resilient environment for the application of MDM policies. It enables us to flexibly apply policies to devices as they move into different policy scopes.

Providing secure access methods for unmanaged devices

While our primary goal is to have users connect to company resources by using managed devices, we also realize that not every user’s circumstances allow for using a completely managed device. We’re using cloud-based desktop virtualization to provide virtual machine–based access to corporate data through a remote connection experience that enables our employees to connect to the data that they need from anywhere, using any device. Desktop virtualization enables us to supply a preconfigured, compliant operating system and application environment in a pre-deployed virtual machine that can be provisioned on demand.

Additionally, we’ve created a browser-based experience allowing access, with limited functionality, to some Microsoft 365 applications. For example, an employee can open Microsoft Outlook in their browser and read and reply to emails, but they will not be able to open any documents or browse any Microsoft websites without first enrolling their devices into management.

Key Takeaways

How we treat the devices that our employees and partners use to access corporate data is an integral component of our Zero Trust model. By verifying device health, we extend the enforcement capabilities of Zero Trust. A verified device, associated with a verified identity, has become the core checkpoint across our Zero Trust model. We’re currently working toward achieving better control over administrative permissions on client devices and a more seamless device enrollment and management process for every device, including Linux–based operating systems. As we continue to strengthen our processes for verifying device health, we’re strengthening our entire Zero Trust model.

Related links

The post Verifying device health at Microsoft with Zero Trust appeared first on Inside Track Blog.

]]>
9002
Providing employees with virtual loaner devices with Windows 365 http://approjects.co.za/?big=insidetrack/blog/providing-employees-with-virtual-loaner-devices-with-windows-365/ Thu, 05 Sep 2024 15:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=16349 Watch as Dave Rodriguez interviews Trent Berghofer about using the Windows 365 Cloud PC platform to provide our employees with virtual loaner PCs when they need a backup machine to keep working. Rodriguez is a principal product manager on the Frictionless Devices team in Microsoft Digital, the company’s IT organization. He talks with Berghofer about […]

The post Providing employees with virtual loaner devices with Windows 365 appeared first on Inside Track Blog.

]]>

Watch as Dave Rodriguez interviews Trent Berghofer about using the Windows 365 Cloud PC platform to provide our employees with virtual loaner PCs when they need a backup machine to keep working.

Rodriguez is a principal product manager on the Frictionless Devices team in Microsoft Digital, the company’s IT organization. He talks with Berghofer about using the Windows 365 Cloud PC platform to provide employees with a low-touch, personalized, secure Windows experience hosted on Microsoft Azure.

“With Windows 365 Cloud PC, we’ve been able to accelerate our digital first support model for hybrid employees and deemphasize our reliance on walk up, in-person support at the on-site service locations,” says Berghofer, general manager of Field IT Management and leader of the Support team in Microsoft Digital.

Issuing Cloud PCs to our employees allows them to return to productivity on a machine they already own or have on their person because we don’t have to send them physical back up machines. This allows them to get back to productivity faster and reduces our costs.

Watch this video to see Trent Berghofer (left) and Dave Rodriguez (right) discuss how we’re using Windows 365 to provide our employees with virtual loaner PCs when they need backup machines to keep working.

The post Providing employees with virtual loaner devices with Windows 365 appeared first on Inside Track Blog.

]]>
16349
Finding and fixing network outages in minutes—not hours—with real-time telemetry at Microsoft http://approjects.co.za/?big=insidetrack/blog/finding-and-fixing-network-outages-in-minutes-not-hours-with-real-time-telemetry-at-microsoft/ Thu, 29 Aug 2024 15:00:00 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=16333 With more than 600 physical worksites around the world, Microsoft has one of the largest network infrastructure footprints on the planet. Managing the thousands of devices that keep those locations connected demands constant attention from a global team of network engineers. It’s their job to monitor and maintain those devices. And when outages occur, they […]

The post Finding and fixing network outages in minutes—not hours—with real-time telemetry at Microsoft appeared first on Inside Track Blog.

]]>

With more than 600 physical worksites around the world, Microsoft has one of the largest network infrastructure footprints on the planet.

Managing the thousands of devices that keep those locations connected demands constant attention from a global team of network engineers. It’s their job to monitor and maintain those devices. And when outages occur, they lead the charge to repair and remediate the situation.

To support their work, our Real Time Telemetry team at Microsoft Digital, the company’s IT organization, has introduced new capabilities that help engineers identify network device outages and capture data faster and more extensively than ever before. Through real-time telemetry, network engineers can isolate and remediate issues in minutes—not hours—to keep their colleagues productive and our technology running smoothly.

Immediacy is everything

Dave, Sinha, Vijay, and Menten pose for pictures that have been assembled into a collage.
Aayush Dave, Astha Sinha, Abhijit Vijay, Daniel Menten, and Martin O’Flaherty (not pictured) are part of the Microsoft Digital Real Time Telemetry team enabling more up-to-date and extensive network device data.

Conventional network monitoring uses the Simple Network Management Protocol (SNMP) architecture, which retrieves network telemetry through periodic, pull-based polls and other legacy technologies. At Microsoft, that polling interval typically ranges between five minutes and six hours.

SNMP is a foundational telemetry architecture with decades of legacy. It’s ubiquitous, but it doesn’t allow for the most up-to-date data possible.

“The biggest pain point we’ve always heard from network engineers is latency in the data,” says Astha Sinha, senior product manager for the Infrastructure and Engineering Services team in Microsoft Digital. “When data is stale, engineers can’t react quickly to outages, and that has implications for security and productivity.”

Serious vulnerabilities and liabilities arise when a network device outage occurs. But because of lags between polling intervals, a network engineer might not receive information or alerts about the situation until long after it happens.

We assembled the Real Time Telemetry team as part of our Infrastructure and Engineering Services to close that gap.

“We build the tools and automations that network engineers use to better manage their networks,” says Martin O’Flaherty, principal product manager for the Infrastructure and Engineering Services team in Microsoft Digital. “To do that, we need to make sure they have the right signals as early and as consistently as possible.”

The technology that powers these possibilities is known as streaming telemetry. It relies on network devices compatible with the more modern gRPC Network Management Interface (gNMI) telemetry protocol and other technologies to support a push-based approach to network monitoring where network devices stream data constantly.

This architecture isn’t new, but our team is scaling and programmatizing how that data becomes available by creating a real-time telemetry apparatus that collects, stores, and delivers network information to service engineers. These capabilities offer several benefits.

The advantages of real-time network device telemetry

Superior anomaly detection, reduced intent and configuration drift, the foundation for large-scale automation and less network downtime.

Better detection of breaches, vulnerabilities, and bugs through automated scans of OS stalls, lateral device hijacking, malware, and other common vulnerabilities.

Visibility into real-time utilization data on network device stats, as well as steady replacement of current data collection technology and more scalable network growth and evolution.

More rapid network fixes, leading to a reduction in the baselines for time-to-detection and time-to-migration for incidents.

“Devices are proactively sending data without having to wait for requests, so they function more efficiently and facilitate timely troubleshooting and optimization,” says Abhijit Vijay, principal software engineering manager with the Infrastructure and Engineering Services team in Microsoft Digital. “Since this approach pushes data continuously rather than at specific intervals, it also reduces the additional network traffic and scales better in larger, more complex environments.

At any given time, Microsoft operates 25,000 to 30,000 network devices, managed by engineers working across 10 different service lines. Accounting for all their needs while keeping data collection manageable and efficient requires extensive collaboration and prioritization.

We also had to account for compatibility. With so many network devices in operation, replacement lifecycles vary. Not all of them are currently gNMI-compatible.

Working with our service lines, we identified the use cases that would provide the best possible ROI, largely based on where we would find the greatest benefits for security and where networks offered a meaningful number of gNMI-compatible devices. We also zeroed in on the types of data that would be the most broadly useful. Being selective helped us preserve resources and avoid overwhelming engineers with too much data.

We built our internal solution entirely using Azure components, including Azure Functions and Azure Kubernetes Service (AKS), Azure Cosmos DB, Redis, and Azure Data Lake. The result is a platform that network engineers can use to access real-time telemetry data.

With key service lines, use cases, and a base of technology in place, we worked with network engineers to onboard the relevant devices. From there, their service lines were free to experiment with our solution on real-world incidents.

Better response times, greater network reliability

Service lines are already experiencing big wins.

In one case, a heating and cooling system went offline for a building in the company’s Millennium Campus in Redmond, Washington. A lack of environmental management has the potential to cause structural damage to buildings if left unchecked, so it was important to resolve this issue as quickly as possible. The service line for wired onsite connections sprang into action as soon as they received a network support ticket.

With real-time telemetry enabled, the team created a Kusto query to compare DOT1X access-session data for the day of the outage with a period before the outage started. Almost immediately, they spotted problematic VLAN switching, including the exact time and duration of the outage. By correlating the timestamps, they determined that the RADIUS registrations of the device owner had expired, which caused the devices to switch into the guest network as part of the zero-trust network implementation.

As a result, the team was able to resolve the registration issues and restore the heating and cooling systems in 10 minutes—a process that might have taken hours using other collection methods due to the lag-time between polling intervals.

“This has the potential to improve alerting, reduce outages, and enhance security,” says Daniel Menten, senior cloud network engineer for site infrastructure management on the Site Wired team. “One of the benefits of real-time telemetry is that it lets us capture information that wasn’t previously available—or that we received too slowly to take action.”

It’s about speeding up how we identify issues and how we then respond to them.  

“With this level of observability, engineers that monitor issues and outages benefit from enhanced experiences,” says Aayush Dave, a product manager on the Infrastructure and Engineering Services team in Microsoft Digital. “And that’s going to make our network more reliable and performant in a world where security issues and outages can have a global impact.”

The future is in real time

Now that real-time telemetry has demonstrated its value, our efforts are focused on broadening and deepening the experience.

“More devices mean more impact,” Dave says. “By increasing the number of network devices that facilitate real-time telemetry, we’re giving our engineers the tools to accelerate their response to these incidents and outages, all leading to enhanced performance and a more robust network reliability posture.”

It’s also about layering on new ways of accessing and using the data.

We’ve just released a preview UI that provides a quick look at essential data, as well as an all-up view of devices in an engineer’s service line. This dashboard will enable a self-service model that makes it even easier to isolate essential telemetry without the need for engineers to create or integrate their own interfaces.

That kind of observability isn’t only about outages. It also enables optimization by helping engineers understand and influence how devices work together.

The depth and quality of real-time telemetry data also provides a wealth of information for training AI models. With enough data spread across enough devices, predictive analysis might be able to provide preemptive alerts when the kinds of network signals that tend to accompany outages appear.

“We’re paving the way for an AIOps future where the system won’t just predict potential issues, but initiate self-healing actions,” says Rob Beneson, partner director of software engineering on the Infrastructure and Engineering Services team in Microsoft Digital.

It’s work that aligns with our company mission.

“This transformation is enhancing our internal user experience and maintaining the network connectivity that’s critical for our ultimate goal,” Beneson says. “We want to empower every person and organization on the planet to achieve more.”

Key Takeaways

Here are some tips for getting started with real-time telemetry at your company:

  • Start with your users. Ask them about pain points, what scares them, and what they need.
  • Start small and go step by step to get the core architecture in place, then work up to the glossier UI and UX elements.
  • Be mindful of onboarding challenges like bugs in vendor hardware and software, especially around security controls.
  • You’ll find plenty of edge cases and code fails, so be prepared to invest in revisiting challenges and fixing problems that arise.
  • Make sure you have a use case and a problem to solve. Have a plan to guide your adoption and use before you turn on real-time telemetry.
  • Make sure you have the proper data infrastructure in place and an apparatus for storing your data.
  • Communicate and demonstrate the value of this solution to the teams who need to invest resources into onboarding it.
  • Prioritize visibility into the devices and data you’ve onboarded through pilots and hero scenarios, then scale onboarding further according to your teams’ needs.
  • Integrate as much as possible. Consider visualizations and pushing into existing network graphs and tools to surface data where engineers already work.

The post Finding and fixing network outages in minutes—not hours—with real-time telemetry at Microsoft appeared first on Inside Track Blog.

]]>
16333
Hardware-backed Windows 11 empowers Microsoft with secure-by-default baseline http://approjects.co.za/?big=insidetrack/blog/hardware-backed-windows-11-empowers-microsoft-with-secure-by-default-baseline/ Wed, 28 Aug 2024 15:00:12 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=11692 Windows 11 makes secure-by-default viable thanks to a combination of modern hardware and software. This ready out-of-the-box protection enables us to create a new baseline internally across Microsoft, one that level sets our enterprise to be more secure for a hybrid workplace. “We’ve made significant strides to create chip-to-cloud Zero Trust out of the box,” […]

The post Hardware-backed Windows 11 empowers Microsoft with secure-by-default baseline appeared first on Inside Track Blog.

]]>
Microsoft Digital storiesWindows 11 makes secure-by-default viable thanks to a combination of modern hardware and software. This ready out-of-the-box protection enables us to create a new baseline internally across Microsoft, one that level sets our enterprise to be more secure for a hybrid workplace.

“We’ve made significant strides to create chip-to-cloud Zero Trust out of the box,” says David Weston, vice president of Enterprise and OS Security at Microsoft. “Windows 11 is redesigned for hybrid work and security with built-in hardware-based isolation, proven encryption, and our strongest protection against malware.”

This new baseline for protection is one of several reasons Microsoft upgraded to Windows 11.

In addition to a better user experience and improved productivity for hybrid work, the new hardware-backed security features create the foundation for new protections. This empowers us to not only protect our enterprise but also our customers.

[Discover how Microsoft uses Zero Trust to protect our users. Learn how new security features for Windows 11 help protect hybrid work. Find out about Windows 11 security by design from chip to the cloud. Get more information about how Secured-core devices protect against firmware attacks.]

How Windows 11 advanced our security journey

Weston smiles in a portrait photo.
Upgrading to Windows 11 gives you more out-of-the-box security options for protecting your company, says David Weston, vice president of Enterprise and OS Security at Microsoft.

Security has always been the top priority here at Microsoft.

We process an average of 65 trillion signals per day, with 2.5 billion of them being endpoint queries, including more than 1,200 password attacks blocked per second. We can analyze these threats to get better at guarding our perimeter, but we can also put new protections in place to reduce the risk posed by persistent attacks.

In 2019, we announced Secured-core PCs designed to utilize firmware protections for Windows users. Enabled by Trusted Platform Module (TPM) 2.0 chips, Secured-core PCs protect encryption keys, user credentials, and other sensitive data behind a hardware barrier. This prevents bad actors and malware from accessing or altering user data and goes a long way in addressing the volume of security events we experience.

“Our data shows that these devices are more resilient to malware than PCs that don’t meet the Secured-core specifications,” Weston says. “TPM 2.0 is a critical building block for protecting user identities and data. For many enterprises, including Microsoft, TPM facilitates Zero Trust security by measuring the health of a device using hardware that is resilient to tampering common with software-only solutions.”

We’ve long used Zero Trust—always verify explicitly, offer least-privilege access, and assume breach—to keep our users and environment safe. Rather than behaving as though everything behind the corporate firewall is secure, Zero Trust reinforces a motto of “never trust, always verify.”

The additional layer of protection offered by TPM 2.0 makes it easier for us to strengthen Zero Trust. That’s why hardware plays a big part in Windows 11 security features. The hardware-backed features of Windows 11 create additional interference against malware, ransomware, and more sophisticated hardware-based attacks.

At a high level, Windows 11 enforced sets of functionalities that we needed anyway. It drove the environment to demonstrate that we were more secure by default. Now we can enforce security features in the Windows 11 pipeline to give users additional protections.

—Carmichael Patton, principal program manager, Digital Security and Resilience

Windows 11 is the alignment of hardware and software to elevate security capabilities. By enforcing a hardware requirement, we can now do more than ever to keep our users, products, and customers safe.

Setting a new baseline at Microsoft

Patton smiles in a portrait photo.
Windows 11 reduces how many policies you need to set up for your security protections to kick in, says Carmichael Patton, a principal program manager with Microsoft Digital Security and Resilience.

While some security features were previously available via configuration, TPM 2.0 allows Windows 11 to protect users immediately, without IT admins or security professionals having to set specific policies.

“At a high level, Windows 11 enforced sets of functionalities that we needed anyway,” says Carmichael Patton, a principal program manager with Digital Security and Resilience, the organization responsible for protecting Microsoft and our products. “It drove the environment to demonstrate that we were more secure by default. Now we can enforce security features in the Windows 11 pipeline to give users additional protections.”

Thus, getting Windows 11 out to our users was a top priority.

Over the course of five weeks, we were able to deploy Windows 11 across 90 percent of eligible devices at Microsoft. Proving to be the least disruptive release to date, this effort assured our users would be immediately covered by baseline protections for a hybrid world.

We can now look across our enterprise and know that users running Windows 11 have a consistent level of protection in place.

The real impact of secure-by-default

Moving from configurable to built-in protection means that Windows 11 becomes the foundation for secure systems as you move up the stack.

It simplifies everything for everyone, including IT admins who may not also be security experts. You can change configurations and optimize Windows 11 protections based on your needs or rely on default security settings. Secure-by-default extends the same flexibility to users, allowing them to safely choose their own applications while still maintaining tight security.

—David Weston, vice president, Enterprise and OS Security

Applications, identity, and the cloud are able to build off the hardware root-of-trust that Windows 11 derives from TPM 2.0. Application security measures like Smart App Control and passwordless sign-in from Windows Hello for Business are all enabled due to hardware-backed protections in the operating system.

Secure-by-default does all of this without removing the important flexibility that has always been part of Windows.

“It simplifies everything for everyone, including IT admins who may not also be security experts,” Weston says. “You can change configurations and optimize Windows 11 protections based on your needs or rely on default security settings. Secure-by-default extends the same flexibility to users, allowing them to safely choose their own applications while still maintaining tight security.”

Key Takeaways
Going forward, IT admins working in Windows 11 no longer need to put extra effort in enabling and testing security features for performance compatibility. Windows 11 makes it easier for us to gain security value without extra work.

This is important when you consider productivity, one of the other drivers for Windows 11. We need to empower our users to stay productive wherever they are. These new security components go hand-in-hand with our productivity requirements. Our users stay safe without seeing any decline in quality, performance, or experience.

“With Windows 11, the focus is on productivity and thinking about security from the ground up,” Patton says. “We know we can do these amazing things, especially with security being front and center.”

Now that Windows 11 is deployed across Microsoft, we can take advantage of TPM 2.0 to bring even greater protections to our users, customers, and products. We’ve already seen this with the Windows 11 2022 update.

For example, Windows Defender App Control (WDAC) enables us to prevent scripting attacks while protecting users from running untrusted applications associated with malware. Other updates include improvements to IT policy and compliance through config lock: a feature that monitors and prevents configuration drift from occurring when users with local admin rights change settings.

These are the kinds of protections made possible with Windows 11.

“Future releases of Windows 11 will continue to add significant security updates that add even more protection from the chip to the cloud by combining modern hardware and software,” Weston says. “Windows 11 is a better way for everyone to collaborate, share, and present, all with the confidence of hardware-backed protections.”

Try it out

Related links

The post Hardware-backed Windows 11 empowers Microsoft with secure-by-default baseline appeared first on Inside Track Blog.

]]>
11692
Reimagining content management at Microsoft with SharePoint Premium http://approjects.co.za/?big=insidetrack/blog/reimagining-content-management-at-microsoft-with-sharepoint-premium/ Thu, 15 Aug 2024 16:10:38 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=16193 At Microsoft, we’ve rolled out SharePoint Premium across the company, including in Microsoft Digital, the company’s IT organization where we’re using it to transform how the company manages its content. SharePoint is the backbone of our content management and collaboration strategy. We use it to enable our employees to access, share, and co-create documents across […]

The post Reimagining content management at Microsoft with SharePoint Premium appeared first on Inside Track Blog.

]]>
Microsoft Digital technical stories

At Microsoft, we’ve rolled out SharePoint Premium across the company, including in Microsoft Digital, the company’s IT organization where we’re using it to transform how the company manages its content.

SharePoint is the backbone of our content management and collaboration strategy. We use it to enable our employees to access, share, and co-create documents across teams and devices for more than 600,000 sites containing 350 million pieces of content and more than 12 petabytes of data. It’s at the core of everything we do, from being the place where individual employees and small teams store and share their work, to being home to our very largest portals, where the entire company comes together to find news and perform important common tasks.

At this scale, we continually face the challenge of ensuring that our content stored in SharePoint is secure, compliant, and easy to find and use.

It’s a big task, according to Stan Liu, senior product manager and knowledge management lead at Microsoft Digital.

Liu and Peer appear in a composite image.
Stan Liu (left to right), Ray Peer, and Sean Squires (not pictured) are part of a team that’s deploying SharePoint Premium to create a new culture of content management at Microsoft.

“We have a complex environment,” Liu says. “With more than 300,000 users accessing the Microsoft 365 tenant across multiple global regions, a significant amount of content is being created and stored within our SharePoint environment.”

Liu is no stranger to the challenges of managing SharePoint at scale.

“We have several teams creating content and many trying to find content,” he says. “Discoverability is always at the front of our minds and making content easy to find requires time and effort in SharePoint.”

Liu’s team is focused on making content management as simple and effective as possible for Microsoft employees. SharePoint users at Microsoft Digital perform many manual tasks to keep SharePoint content secure, compliant, and easy to find and use. They apply their efforts to provide better governance over constantly increasing digital content, prevent accidental sharing, and effectively manage the content lifecycle.

At this scale, with the challenges of discoverability and manual effort clearly in focus, Liu’s team has turned to SharePoint Premium to meet these challenges and prepare Microsoft Digital for the next generation of content management and usage scenarios.

Discovering, automating, and more with SharePoint Premium

SharePoint Premium uses the power of Microsoft Azure Cognitive Services and the Microsoft Power Platform to bring AI, automation, and added security to content experiences, processing, and governance to SharePoint. It delivers new ways to engage with our most critical content, managing and protecting it through its lifecycle.

AI is at the root of the SharePoint Premium feature set, enhancing productivity and collaboration. AI-driven search provides personalized and relevant search results by understanding user intent and context. AI-powered insights help users discover patterns and trends in their data, enabling more informed decision-making. AI-automated workflows and content management streamline processes, while AI-infused advanced security measures ensure data protection.

SharePoint Premium includes a large set of services, including:

  • Autofill columns. Autofill columns use large language models to automatically pull, condense, or create content from files in a SharePoint document library. This feature allows selected columns to store metadata without manual input, simplifying file management and data organization.
  • Content assembly. Content assembly automates the creation of routine business documents, including contracts, statements of work, service agreements, consent letters, and other types of correspondence.
  • Document processing. Using prebuilt, structured, unstructured, and freeform document processing models, SharePoint Premium can extract information from many document types, such as contracts, invoices, and receipts. It can also detect and extract sensitive information from documents.
  • Image tagging. Image tagging helps users find and manage images in SharePoint document libraries. The image-tagging service automatically tags images with descriptive keywords using AI. These keywords are stored in a managed metadata column, making it easier to search, sort, filter, and manage the images.
  • Taxonomy tagging. Taxonomy tagging helps users find and manage terms in SharePoint document libraries. SharePoint Premium uses AI to automatically tag documents with terms or term sets configured in the taxonomy store. These terms and sets are stored in a managed metadata column, making documents easier to search, sort, filter, and manage.
  • Document translation. SharePoint Premium can create a translated copy of a document or video transcript in a SharePoint document library while preserving the file’s original format and structure.
  • SharePoint eSignature. SharePoint eSignature facilitates the sending of electronic signature requests, ensuring documents remain within Microsoft 365 during the review and signing process. eSignature can efficiently and securely dispatch documents to be signed by individuals within or outside the organization.
  • Optical character recognition. The optical character recognition (OCR) service extracts printed or handwritten text from images. SharePoint Premium automatically scans the image files, extracts the relevant text, and makes the text from the images available for search and indexing. This enables quick and accurate location of key phrases and terms.

“SharePoint Premium is really built around discovery and automation, with a huge emphasis on AI to help perform tasks efficiently at scale,” says Sean Squires, a principal product manager in the OneDrive and SharePoint Product Group. “We need that granular control and understanding of how our content and intellectual property is represented, shared, and used.”

Creating a culture of content management

There’s also a cultural element that’s critical to the team’s work.

“SharePoint Premium represents a shift in how Microsoft Digital approaches content management, not just as a new technology but as a new way of working,” Liu says. “It’s about integrating AI capabilities into daily practices to automate mundane tasks like tagging content, making it more discoverable, and keeping it up to date. This integration aims to make content management a part of daily habits and routines, ensuring content remains relevant and useful.”

Liu highlights the importance of making content management a daily habit and how AI can simplify the process. He recognizes the need for a cultural shift to incentivize active participation in content management. It’s also important to measure the impact of content contributions on others. The goal is to make content management processes, such as classifying content, a regular practice to ensure high-quality content within the enterprise.

Part of the cultural shift is in how we think about SharePoint itself. Moving from “site-centric” to “document-centric” usage of SharePoint signifies a strategic shift in how we manage SharePoint content at Microsoft Digital. Metadata and content context are critical to ensuring our content is easy to find and relevant, and we’re leaning on SharePoint Premium features to help us do that. Incentivizing active participation in content management and making it a daily habit for our employees is critical to a wider and more consistent realization of the benefits provided by SharePoint Premium across the organization.

“How do we find ways to make things easier without somebody having to do anything?” asks Ray Peer, a senior product manager in Microsoft Digital. “That’s where we’re using the SharePoint Premium AI capabilities to help with things like automatic processing and auto-tagging. These are mundane tasks that people don’t like to do. So instead of just forcing change on the culture, we’re finding ways to make it easier for the culture to change.”

Microsoft Digital has already seen huge successes in making it easier for the culture to change with SharePoint Premium.

The Microsoft Cloud Operations & Innovation Finance team experienced several issues in accurately tracking and managing their invoices. In certain situations, the team found it difficult to find unpaid invoices or uncover missing information in invoices. These issues made it more difficult to keep track of payments and created delays in locating invoices.

To address these issues, they created a SharePoint site dedicated to invoice management for the finance team. It used the prebuilt SharePoint Premium document processing models to automatically extract important data from invoices uploaded to the document library, including PO numbers, dates, amounts, and client information. They added column metadata to track payment status and applied conditional formatting and highlighting to categorize invoices and draw attention to missing information in invoice fields.

It’s a perfect example of how an AI-driven feature like document processing in SharePoint Premium can radically transform a business process within a simple SharePoint document library. The solution reduced costs, decreased processing times, improved accuracy, and enabled better compliance for the Microsoft Cloud Operations & Innovation Finance team.

Peer reiterates that solutions like this have a way of gaining momentum in the organization.

“This solution quickly came to the attention of other finance-based departments within Microsoft,” Peer says. “Other managers wanted the same benefits and asked for the same solution. It was easy to replicate, and suddenly, those benefits were multiplied across the company.”

It’s not an isolated situation. Many other business groups have similar stories.

The Microsoft Partner Incentive Operations team sends hundreds of letters to Microsoft partners daily using a set of Microsoft Word templates. IT staff created the templates manually and updated them manually when necessary. On average, it took 75 minutes to create a template and 30 minutes to review each letter and send it to a partner organization.

To improve efficiency, they implemented a new letter generation process for partner letters based on the SharePoint Premium Content Assembly service. They created a SharePoint modern template document for the letter types they used and integrated the templates with data sourced from internal systems containing relevant information customized for each partner, by market, region and sales offer type.

The new solution created a flexible method for creating partner letters with dynamic placeholders in the document and multiple letter formats, including text, tables, and conditional sections, all driven by a self-serve UI. Letter creators could completely automate the letter creation process without any manual intervention.

The new solution created more consistent partner letter results, and the automated process saved the team more than 6,000 hours per year in manual template creation and refresh tasks, leading to a 30% increase in business agility and a decrease in time-to-market.

Integrating Microsoft 365 Copilot with AI

Microsoft 365 Copilot integrates seamlessly with SharePoint Premium to enhance its capabilities, particularly in automation and AI. The content AI and intelligent document processing built into SharePoint Premium use advanced machine learning models to classify content, organize it, extract relevant information, and automate workflows at scale. The improvements in metadata and content quality directly improve the performance and results in Copilot.

Copilot complements SharePoint Premium by using large language models to assist with document creation, Q&A, and running complex queries. It can help find specific documents based on criteria and automate tasks like translations or routing documents to appropriate teams. The integration aims to democratize the ability to configure complex machine learning models, making it easier for users to apply them to their content and achieve significant productivity gains.

The symbiotic relationship between Copilot and SharePoint Premium is particularly evident in their shared goal of automating content processing. For example, SharePoint Premium can automatically tag documents with metadata, which Copilot can then use to perform more robust queries and assist with organizing content. This collaboration represents a step towards a future where sophisticated AI-driven workflows are accessible to all users, enhancing productivity and efficiency across the organization.

It’s a vision that’s already becoming a reality at Microsoft Digital.

Looking forward

We’re anticipating a near future where AI-based content management capabilities and automation fully intersect with large language models and language understanding services to create a sophisticated combination of intelligence and automation.

“We can easily envision the capability to perform a set of complex tasks over complex content with a single prompt,” Squires says. “I might ask Microsoft 365 Copilot to find all invoices for the Fabrikam company worth more than $10,000 from 2023 and send copies of those invoices to my finance manager. SharePoint Premium is putting that future within reach at Microsoft Digital, and that’s exciting.”

Microsoft Digital will continue to invest in SharePoint Premium capabilities across the organization and work with the product group as Customer Zero, growing SharePoint Premium features to push the boundaries of what’s capable with AI-powered content management.

Key Takeaways

Here are a few takeaways that can help you get started with SharePoint Premium in your organization:

  • Explore the different Content AI services that SharePoint Premium offers, such as autofill columns, content assembly, document processing, image tagging, taxonomy tagging, document translation, eSignature, and optical character recognition.
  • Identify the business processes and scenarios in your organization that could benefit from AI-driven content management and automation, such as invoice tracking, partner or customer correspondence, document creation, and content discovery.
  • Learn how to configure and use SharePoint Premium features in your SharePoint document libraries, such as creating and applying metadata columns, setting up content assembly templates, enabling document processing models, and using image and taxonomy tagging.
  • Integrate Microsoft 365 Copilot with SharePoint Premium to enhance your content experiences and workflows, such as querying for specific documents, translating content, routing documents to appropriate teams, and creating documents with natural language prompts.

The post Reimagining content management at Microsoft with SharePoint Premium appeared first on Inside Track Blog.

]]>
16193
Boosting employee device procurement at Microsoft with better forecasting http://approjects.co.za/?big=insidetrack/blog/boosting-employee-device-procurement-at-microsoft-with-better-forecasting/ Fri, 28 Jun 2024 15:16:15 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=9836 Device forecasting at Microsoft has allowed the company to plan for new hires, replace out-of-warranty devices for existing employees, and respond to major events, like the release of Windows 11. As a result, we’ve been able to strategically acquire equipment in a more efficient way. It all started with a shift to remote work. “New […]

The post Boosting employee device procurement at Microsoft with better forecasting appeared first on Inside Track Blog.

]]>
Microsoft Digital storiesDevice forecasting at Microsoft has allowed the company to plan for new hires, replace out-of-warranty devices for existing employees, and respond to major events, like the release of Windows 11. As a result, we’ve been able to strategically acquire equipment in a more efficient way.

It all started with a shift to remote work.

“New employees will always need a device on day one,” says Pandurang Kamath Savagur, a senior program manager with Microsoft Digital, the organization that powers, protects, and transforms the company. “But for the first time ever, we were also in an experience where people had to stay productive from home with only a single device. They couldn’t easily get into the offices for a secondary or loaner device.”

To anticipate demand and offset delays, Microsoft Digital built a platform where administrators across the company could project the number of devices they’d need. Simultaneously, the group took a deep dive look at the current device population to forecast the number of employees who would need a device refresh—all in time for the deployment of Windows 11.

[Discover how Microsoft quickly upgraded to Windows 11. Find out how Microsoft is reinventing the employee experience for a hybrid world. Learn more about verifying devices in a Zero Trust model.]

Getting better at predicting the future

Historically, Microsoft didn’t need to build up a large inventory of devices for employees; everything was made to order.

Business groups own the budget, so they know what the next six months will look like for their team. Microsoft onboards approximately 3,000 employees each month, and every employee needs to select and set up a device. We can’t just buy 3,000 devices a month—we need to know specifications about how it will be used.

—Pandurang Kamath Savagur, senior program manager, Microsoft Digital

It worked a little bit like this:

Procurement, having already certified devices and negotiated pricing and SLAs suitable for employees, enables administrators or direct employees to obtain a new employee device through our internal ProcureWeb tool. The tool places a purchase order directly to the OEM—the third-party manufacturer of the device—or a reseller who would then manufacture and ship the equipment out to the user.

But the shift in how people worked meant we’d need to be more proactive in procuring devices for employees. And to get there, we’d need a better picture of fluctuating demand.

“Business groups own the budget, so they know what the next six months will look like for their team,” Savagur says. “Microsoft onboards approximately 3,000 employees each month, and every employee needs to select and set up a device. We can’t just buy 3,000 devices a month—we need to know specifications about how it will be used.”

Everything from storage space, computing power, memory, and keyboard language to the number of units would need to be collected from business groups. Once that information came in, Procurement could work with OEMs to have machines ready and available to be delivered to administrators well in advance.

This new approach to device forecasting has streamlined the way Microsoft acquires devices, giving us adequate stock to ensure a good experience. We can now anticipate device purchases for new hires while also accounting for break fixes.

And the timing of this effort couldn’t have been better—Windows 11 was on the way, and we would need this new approach along with additional analysis to get the new operating system into the hands of employees.

Empowering Microsoft with Windows 11

Released in late 2021, Windows 11 gives us the enterprise-grade security that Microsoft requires. To achieve this secure-by-default state, we needed to replace older devices with equipment that met the Windows 11 hardware requirements.

But instead of issuing new devices to everyone at launch—something that would be both costly and logistically impossible—we took a strategic approach, using a combination of telemetry and machine learning to identify and prioritize devices for replacement.

Cheng and Sawant smile in portrait photos that have been brought together in a photo collage.
Anqi Cheng and Neeti Sawant teamed up to transform the way the company handles its internal device forecasting. Cheng is a data scientist with the W+D Data team, and Sawant is a data engineer with Microsoft Digital.

“We have telemetry data, application usage, and warranty information, and that gives us a base to forecast from in Power BI,” says Neeti Sawant, a data engineer with Microsoft Digital who helped create a device forecasting dashboard as part of this effort. “It told us what we needed to monitor and forecast, which devices are aging out, and when they would be eligible for a refresh.”

But we weren’t just relying on warranty data alone.

Using Microsoft Azure Cosmos DB and Microsoft Azure DataBricks for machine learning, we are able to leverage the historical data for device population and apply survival modeling techniques, predicting how many ineligible primary devices would be active over the next few years towards the Windows 10 end of support.

Device forecasting has allowed us to work closely with OEMs so that devices are available on time and so that we’re not selecting on availability, but rather meeting all the performance, compliance, and security needs of our users. Satisfaction scores from employees have increased by 20 points since we started doing this.

—Pandurang Kamath Savagur, senior program manager, Microsoft Digital

“Not all users will replace their device at the end of warranty,” says Anqi Cheng, a data scientist with the W+D Data team at Microsoft. “Although many devices will naturally age out over time, many users hang on to their devices for an extended time. When combined with other device forecasting data, we had a holistic view of the landscape.”

This level of analysis ensured Microsoft would be able to quickly develop a roadmap for getting employees on Windows 11.

A bright forecast for Microsoft

Employees at Microsoft can—and should—expect to have a device that engages, protects, and empowers them. Device forecasting makes this possible.

“Device forecasting has allowed us to work closely with OEMs so that devices are not selected on availability, but rather meeting all the performance, compliance, and security needs of our users,” Savagur says. This effort has resulted in a better experience for employees. “Satisfaction scores from employees have increased by 20 points since we started doing this.”

Access to device forecasting information has also been helpful to admins and Finance, who now have a better idea as to which devices will need to be refreshed for Windows 11. Moving into the future, these same projections will make it easier for Procurement to put the right device into an employee’s hands.

“With the analysis provided to us by Microsoft Digital, we can now understand how many primary devices are in our environment and when we expect them to refresh,” says Colby McNorton, a senior program manager on the Microsoft Procurement team. “As we look forward, instead of the purchasing journey being reactive, we can proactively reach out to users and tell them that their device is at the end of its life and even recommend a device based on what we know about usage.”

Thanks to Windows Autopilot, new devices are automatically pre-configured with Windows 11. Windows Autopilot deploys an OEM-optimized version of the Windows client, so you don’t have to maintain custom images and drivers for every device model. This makes new devices business-ready faster, empowering employees to stay engaged and protected. Users can just switch on, sign in, and all policies and apps will be in place within a day.

 

Key Takeaways

  • Be sure to get visibility into your device population. Find out what kinds of devices are on your network, where they’re located, who owns them, and what stage they’re at in their lifecycle. This gives you a lot of agility in a changing environment. You can do this using Microsoft Intune.
  • Windows 10 and Windows 11 can be co-managed side by side using the same tools and processes, which makes it possible for Microsoft and other companies to be methodical about replacing devices.
  • Spend time with team admins who understand user needs. This allows you to cultivate a short list of devices that are best suited for your employees and gives procurement clear priorities.

Related links

We'd like to hear from you!
Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Boosting employee device procurement at Microsoft with better forecasting appeared first on Inside Track Blog.

]]>
9836
Improving security by protecting elevated-privilege accounts at Microsoft http://approjects.co.za/?big=insidetrack/blog/improving-security-by-protecting-elevated-privilege-accounts-at-microsoft/ Fri, 21 Jun 2024 12:50:21 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=9774 [Editor’s note: This content was written to highlight a particular event or moment in time. Although that moment has passed, we’re republishing it here so you can see what our thinking and experience was like at the time.] An ever-evolving digital landscape is forcing organizations to adapt and expand to stay ahead of innovative and […]

The post Improving security by protecting elevated-privilege accounts at Microsoft appeared first on Inside Track Blog.

]]>
Microsoft Digital technical stories[Editor’s note: This content was written to highlight a particular event or moment in time. Although that moment has passed, we’re republishing it here so you can see what our thinking and experience was like at the time.]

An ever-evolving digital landscape is forcing organizations to adapt and expand to stay ahead of innovative and complex security risks. Increasingly sophisticated and targeted threats, including phishing campaigns and malware attacks, attempt to harvest credentials or exploit hardware vulnerabilities that allow movement to other parts of the network, where they can do more damage or gain access to unprotected information.

We on the Microsoft Digital Employee Experience (MDEE) team, like many IT organizations, used to employ a traditional IT approach to securing the enterprise. We now know that effective security calls for a defense-in-depth approach that requires us to look at the whole environment—and everyone that accesses it—to implement policies and standards that better address risks.

To dramatically limit our attack surface and protect our assets, we developed and implemented our own defense-in-depth approach. This includes new company standards, telemetry, monitoring, tools, and processes to protect administrators and other elevated-privilege accounts.

In an environment where there are too many administrators, or elevated-privilege accounts, there is an increased risk of compromise. When elevated access is persistent or elevated-privilege accounts use the same credentials to access multiple resources, a compromised account can become a major breach.

This blog post highlights the steps we are taking at Microsoft to protect our environment and administrators, including new programs, tools, and considerations, and the challenges we faced. We will provide some details about the new “Protect the Administrators” program that is positively impacting the Microsoft ecosystem. This program takes security to the next level across the entire enterprise, ultimately changing our digital-landscape security approach.

[Learn how we’re protecting high-risk environments with secure admin workstations. Read about implementing a Zero Trust security model at Microsoft. Learn more about how we manage Privileged Access Workstations.]

Understanding defense-in-depth protection

Securing all environments within your organization is a great first step in protecting your company. But there’s no silver-bullet solution that will magically counter all threats. At Microsoft, information protection rests on a defense-in-depth approach built on device health, identity management, and data and telemetry—a concept illustrated by the three-legged security stool, in the graphic below. Getting security right is a balancing act. For a security solution to be effective, it must address all three aspects of risk mitigation on a base of risk management and assurance—or the stool topples over and information protection is at risk.

Information protection depicted as a stool with three legs that represent device health, identity management, and data and telemetry.
The three-legged-stool approach to information protection.

Risk-based approach

Though we would like to be able to fix everything at once, that simply isn’t feasible. We created a risk-based approach to help us prioritize every major initiative. We used a holistic strategy that evaluated all environments, administrative roles, and access points to help us define our most critical roles and resources within the Microsoft ecosystem. Once defined, we could identify the key initiatives that would help protect the areas that represent the highest levels of risk.

As illustrated in the graphic below, the access-level roles that pose a higher risk should have fewer accounts—helping reduce the impact to the organization and control entry.

The next sections focus primarily on protecting elevated user accounts and the “Protect the Administrators” program. We’ll also discuss key security initiatives that are relevant to other engineering organizations across Microsoft.

Illustration of the risk-role pyramid we use to help prioritize security initiatives.
The risk-role pyramid.

Implementing the Protect the Administrators program

After doing a deeper analysis of our environments, roles, and access points, we developed a multifaceted approach to protecting our administrators and other elevated-privilege accounts. Key solutions include:

  • Working to ensure that our standards and processes are current, and that the enterprise is compliant with them.
  • Creating a targeted reduction campaign to scale down the number of individuals with elevated-privilege accounts.
  • Auditing elevated-privilege accounts and role management to help ensure that only employees who need elevated access retain elevated-access privileges.
  • Creating a High Value Asset (HVA)—an isolated, high-risk environment—to host a secure infrastructure and help reduce the attack surface.
  • Providing secure devices to administrators. Secure admin workstations (SAWs) provide a “secure keyboard” in a locked-down environment that helps curb credential-theft and credential-reuse scenarios.
  • Reporting metrics and data that help us share our story with corporate leadership as well as getting buy-in from administrators and other users who have elevated-privilege accounts across the company.

Defining your corporate landscape

In the past, equipment was primarily on-premises, and it was assumed to be easier to keep development, test, and production environments separate, secure, and well-isolated without a lot of crossover. Users often had access to more than one of these environments but used a persistent identity—a unique combination of username and password—to log into all three. After all, it’s easier to remember login information for a persistent identity than it is to create separate identities for each environment. But because we had strict network boundaries, this persistent identity wasn’t a source of concern.

Today, that’s not the case. The advent of the cloud has dissolved the classic network edge. The use of on-premises datacenters, cloud datacenters, and hybrid solutions are common in nearly every company. Using one persistent identity across all environments can increase the attack surface exposed to adversaries. If compromised, it can yield access to all company environments. That’s what makes identity today’s true new perimeter.

At Microsoft, we reviewed our ecosystem to analyze whether we could keep production and non-production environments separate. We used our Red Team/penetration (PEN) testers to help us validate our holistic approach to security, and they provided great guidance on how to further establish a secure ecosystem.

The graphic below illustrates the Microsoft ecosystem, past and present. We have three major types of environments in our ecosystem today: our Microsoft and Office 365 tenants, Microsoft Azure subscriptions, and on-premises datacenters. We now treat them all like a production environment with no division between production and non-production (development and test) environments.

Microsoft ecosystem then and now. Three environment types now: Microsoft/Office 365 tenants, Azure subscriptions, on-premises datacenters.
Now, everything is considered a “production” environment. We treat our three major environments in the Microsoft ecosystem like production.

Refining roles to reduce attack surfaces

Prior to embarking on the “Protect the Administrators” program, we felt it was necessary to evaluate every role with elevated privileges to determine their level of access and capability within our landscape. Part of the process was to identify tooling that would also protect company security (identity, security, device, and non-persistent access).

Our goal was to provide administrators the means to perform their necessary duties in support of the technical operations of Microsoft with the necessary security tooling, processes, and access capabilities—but with the lowest level of access possible.

The top security threats that every organization faces stem from too many employees having too much persistent access. Every organization’s goal should be to dramatically limit their attack surface and reduce the amount of “traversing” (lateral movement across resources) a breach will allow, should a credential be compromised. This is done by limiting elevated-privilege accounts to employees whose roles require access and by ensuring that the access granted is commensurate with each role. This is known as “least-privileged access.” The first step in reaching this goal is understanding and redefining the roles in your company that require elevated privileges.

Defining roles

We started with basic definitions. An information-worker account does not allow elevated privileges, is connected to the corporate network, and has access to productivity tools that let the user do things like log into SharePoint, use applications like Microsoft Excel and Word, read and send email, and browse the web.

We defined an administrator as a person who is responsible for the development, build, configuration, maintenance, support, and reliable operations of applications, networks, systems, and/or environments (cloud or on-premises datacenters). In general terms, an administrator account is one of the elevated-privilege accounts that has more access than an information worker’s account.

Using role-based controls to establish elevated-privilege roles

We used a role-based access control (RBAC) model to establish which specific elevated-privilege roles were needed to perform the duties required within each line-of-business application in support of Microsoft operations. From there, we deduced a minimum number of accounts needed for each RBAC role and started the process of eliminating the excess accounts. Using the RBAC model, we went back and identified a variety of roles requiring elevated privileges in each environment.

For the Microsoft Azure environments, we used RBAC, built on Microsoft Azure Resource Manager, to manage who has access to Azure resources and to define what they can do with those resources and what areas they have access to. Using RBAC, you can segregate duties within your team and grant to users only the amount of access that they need to perform their jobs. Instead of giving everybody unrestricted permissions in our Azure subscription or resources, we allow only certain actions at a particular scope.

Performing role attestation

We explored role attestation for administrators who moved laterally within the company to make sure their elevated privileges didn’t move with them into the new roles. Limited checks and balances were in place to ensure that the right privileges were applied or removed when someone’s role changed. We fixed this immediately through a quarterly attestation process that required the individual, the manager, and the role owner to approve continued access to the role.

Implementing least-privileged access

We identified those roles that absolutely required elevated access, but not all elevated-privilege accounts are created equal. Limiting the attack surface visible to potential aggressors depends not only on reducing the number of elevated-privilege accounts. It also relies on only providing elevated-privilege accounts with the least-privileged access needed to get their respective jobs done.

For example, consider the idea of crown jewels kept in the royal family’s castle. There are many roles within the operations of the castle, such as the king, the queen, the cook, the cleaning staff, and the royal guard. Not everyone can or should have access everywhere. The king and queen hold the only keys to the crown jewels. The cook needs access only to the kitchen, the larder, and the dining room. The cleaning staff needs limited access everywhere, but only to clean, and the royal guard needs access to areas where the king and queen are. No one other than the king and queen, however, needs access to the crown jewels. This system of restricted access provides two benefits:

  • Only those who absolutely require access to a castle area have keys, and only to perform their assigned jobs, nothing more. If the cook tries to access the crown jewels, security alarms notify the royal guard, along with the king and queen.
  • Only two people, the king and queen, have access to the crown jewels. Should anything happen to the crown jewels, a targeted evaluation of those two people takes place and doesn’t require involvement of the cook, the cleaning staff, or the royal guard because they don’t have access.

This is the concept of least-privileged access: We only allow you access to a specific role to perform a specific activity within a specific amount of time from a secure device while logged in from a secure identity.

Creating a secure high-risk environment

We can’t truly secure our devices without having a highly secure datacenter to build and house our infrastructure. We used HVA to implement a multitiered and highly secure high-risk environment (HRE) for isolated hosting. We treated our HRE as a private cloud that lives inside a secure datacenter and is isolated from dependencies on external systems, teams, and services. Our secure tools and services are built within the HRE.

Traditional corporate networks were typically walled only at the external perimeters. Once an attacker gained access, it was easier for a breach to move across systems and environments. Production servers often reside on the same segments or on the same levels of access as clients, so you inherently gain access to servers and systems. If you start building some of your systems but you’re still dependent on older tools and services that run in your production environment, it’s hard to break those dependencies. Each one increases your risk of compromise.

It’s important to remember that security awareness requires ongoing hygiene. New tools, resources, portals, and functionality are constantly coming online or being updated. For example, certain web browsers sometimes release updates weekly. We must continually review and approve the new releases, and then repackage and deploy the replacement to approved locations. Many companies don’t have a thorough application-review process, which increases their attack surface due to poor hygiene (for example, multiple versions, third-party and malware-infested application challenges, unrestricted URL access, and lack of awareness).

The initial challenge we faced was discovering all the applications and tools that administrators were using so we could review, certify, package, and sign them as approved applications for use in the HRE and on SAWs. We also needed to implement a thorough application-review process, specific to the applications in the HRE.

Our HRE was built as a trust-nothing environment. It’s isolated from other less-secure systems within the company and can only be accessed from a SAW—making it harder for adversaries to move laterally through the network looking for the weakest link. We use a combination of automation, identity isolation, and traditional firewall isolation techniques to maintain boundaries between servers, services, and the customers who use them. Admin identities are distinct from standard corporate identities and subject to more restrictive credential- and lifecycle-management practices. Admin access is scoped according to the principle of least privilege, with separate admin identities for each service. This isolation limits the scope that any one account could compromise. Additionally, every setting and configuration in the HRE must be explicitly reviewed and defined. The HRE provides a highly secure foundation that allows us to build protected solutions, services, and systems for our administrators.

Secure devices

Secure admin workstations (SAWs) are limited-use client machines that substantially reduce the risk of compromise. They are an important part of our layered, defense-in-depth approach to security. A SAW doesn’t grant rights to any actual resources—it provides a “secure keyboard” in which an administrator can connect to a secure server, which itself connects to the HRE.

A SAW is an administrative-and-productivity-device-in-one, designed and built by Microsoft for one of our most critical resources—our administrators. Each administrator has a single device, a SAW, where they have a hosted virtual machine (VM) to perform their administrative duties and a corporate VM for productivity work like email, Microsoft Office products, and web browsing.

When working, administrators must keep secure devices with them, but they are responsible for them at all times. This requirement mandated that the secure device be portable. As a result, we developed a laptop that’s a securely controlled and provisioned workstation. It’s designed for managing valuable production systems and performing daily activities like email, document editing, and development work. The administrative partition in the SAW curbs credential-theft and credential-reuse scenarios by locking down the environment. The productivity partition is a VM with access like any other corporate device.

The SAW host is a restricted environment:

  • It allows only signed or approved applications to run.
  • The user doesn’t have local administrative privileges on the device.
  • By design, the user can browse only a restricted set of web destinations.
  • All automatic updates from external parties and third-party add-ons or plug-ins are disabled.

Again, the SAW controls are only as good as the environment that holds them, which means that the SAW isn’t possible without the HRE. Maintaining adherence to SAW and HRE controls requires an ongoing operational investment, similar to any Infrastructure as a Service (IaaS). Our engineers code-review and code-sign all applications, scripts, tools, and any other software that operates or runs on top of the SAW. The administrator user has no ability to download new scripts, coding modules, or software outside of a formal software distribution system. Anything added to the SAW gets reviewed before it’s allowed on the device.

As we onboard an internal team onto SAW, we work with them to ensure that their services and endpoints are accessible using a SAW device. We also help them integrate their processes with SAW services.

Provisioning the administrator

Once a team has adopted the new company standard of requiring administrators to use a SAW, we deploy the Microsoft Azure-based Conditional Access (CA) policy. As part of CA policy enforcement, administrators can’t use their elevated privileges without a SAW. Between the time that an administrator places an order and receives the new SAW, we provide temporary access to a SAW device so they can still get their work done.

We ensure security at every step within our supply chain. That includes using a dedicated manufacturing line exclusive to SAWs, ensuring chain of custody from manufacturing to end-user validation. Since SAWs are built and configured for the specific user rather than pulling from existing inventory, the process is much different from how we provision standard corporate devices. The additional security controls in the SAW supply chain add complexity and can make scaling a challenge from the global-procurement perspective.

Supporting the administrator

SAWs come with dedicated, security-aware support services from our Secure Admin Services (SAS) team. The SAS team is responsible for the HRE and the critical SAW devices—providing around-the-clock role-service support to administrators.

The SAS team owns and supports a service portal that facilitates SAW ordering and fulfillment, role management for approved users, application and URL hosting, SAW assignment, and SAW reassignment. They’re also available in a development operations (DevOps) model to assist the teams that are adopting SAWs.

As different organizations within Microsoft choose to adopt SAWs, the SAS team works to ensure they understand what they are signing up for. The team provides an overview of their support and service structure and the HRE/SAW solution architecture, as illustrated in the graphic below.

A high-level overview of the HRE/SAW solution architecture, including SAS team and DevOps support services.
An overview of an isolated HRE, a SAW, and the services that help support administrators.

Today, the SAS team provides support service to more than 40,000 administrators across the company. We have more work to do as we enforce SAW usage across all teams in the company and stretch into different roles and responsibilities.

Password vaulting

The password-vaulting service allows passwords to be securely encrypted and stored for future retrieval. This eliminates the need for administrators to remember passwords, which has often resulted in passwords being written down, shared, and compromised.

SAS Password Vaulting is composed of two internal, custom services currently offered through our SAS team:

  • A custom solution to manage domain-based service accounts and shared password lists.
  • A local administrator password solution (LAPS) to manage server-local administrator and integrated Lights-Out (iLO) device accounts.

Password management is further enhanced by the service’s capability to automatically generate and roll complex random passwords. This ensures that privileged accounts have high-strength passwords that are changed regularly and reduces the risk of credential theft.

Administrative policies

We’ve put administrative policies in place for privileged-account management. They’re designed to protect the enterprise from risks associated with elevated administrative rights. Microsoft Digital reduces attack vectors with an assortment of security services, including SAS and Identity and Access Management, that enhance the security posture of the business. Especially important is the implementation of usage metrics for threat and vulnerability management. When a threat or vulnerability is detected, we work with our Cyber Defense Operations Center (CDOC) team. Using a variety of monitoring systems through data and telemetry measures, we ensure that compliance and enforcement teams are notified immediately. Their engagement is key to keeping the ecosystem secure.

Just-in-time entitlement system

Least-privileged access paired with a just-in-time (JIT) entitlement system provides the least amount of access to administrators for the shortest period of time. A JIT entitlement system allows users to elevate their entitlements for limited periods of time to complete elevated-privilege and administrative duties. The elevated privileges normally last between four and eight hours.

JIT allows removal of users’ persistent administrative access (via Active Directory Security Groups) and replaces those entitlements with the ability to elevate into roles on-demand and just-in-time.e used proper RBAC approaches with an emphasis on providing access only to what is absolutely required. We also implemented access controls to remove excess access (for example, Global Administrator or Domain Administrator privileges).

An example of how JIT is part of our overarching defense-in-depth strategy is a scenario in which an administrator’s smartcard and PIN are stolen. Even with the physical card and the PIN, an attacker would have to successfully navigate a JIT workflow process before the account would have any access rights.
Key Takeaways

In the three years this project has been going on, we have learned that an ongoing commitment and investment are critical to providing defense-in-depth protection in an ever-evolving work environment. We have learned a few things that could help other companies as they decide to better protect their administrators and, thus, their company assets:

  • Securing all environments. We needed to evolve the way we looked at our environments. Through evolving company strategy and our Red Team/PEN testing, it has been proven numerous times that successful system attacks take advantage of weak controls or bad hygiene in a development environment to access and cause havoc in production.
  • Influencing, rather than forcing, cultural change. Microsoft employees have historically had the flexibility and freedom to do amazing things with the products and technology they had on hand. Efforts to impose any structure, rigor, or limitation on that freedom can be challenging. Taking people’s flexibility away from them, even in the name of security, can generate friction. Inherently, employees want to do the right thing when it comes to security and will adopt new and better processes and tools as long as they understand the need for them. Full support of the leadership team is critical in persuading users to change how they think about security. It was important that we developed compelling narratives for areas of change, and had the data and metrics to reinforce our messaging.
  • Scaling SAW procurement. We secure every aspect of the end-to-end supply chain for SAWs. This level of diligence does result in more oversight and overhead. While there might be some traction around the concept of providing SAWs to all employees who have elevated-access roles, it would still be very challenging for us to scale to that level of demand. From a global perspective, it is also challenging to ensure the required chain of custody to get SAWs into the hands of administrators in more remote countries and regions. To help us overcome the challenges of scale, we used a phased approach to roll out the Admin SAW policy and provision SAWs.
  • Providing a performant SAW experience for the global workforce. We aim to provide a performant experience for all users, regardless of their location. We have users around the world, in most major countries and regions. Supporting our global workforce has required us to think through and deal with some interesting issues regarding the geo-distribution of services and resources. For instance, locations like China and some places in Europe are challenging because of connectivity requirements and performance limitations. Enforcing SAW in a global company has meant dealing with these issues so that an administrator, no matter where they are located, can effectively complete necessary work.

What’s next

As we stated before, there are no silver-bullet solutions when it comes to security. As part of our defense-in-depth approach to an ever-evolving threat landscape, there will always be new initiatives to drive.

Recently, we started exploring how to separate our administrators from our developers and using a different security approach for the developer roles. In general, developers require more flexibility than administrators.

There also continue to be many other security initiatives around device health, identity and access management, data loss protection, and corporate networking. We’re also working on the continued maturity of our compliance and governance policies and procedures.

Getting started

While it has taken us years to develop, implement, and refine our multitiered, defense-in-depth approach to security, there are some solutions that you can adopt now as you begin your journey toward improving the state of your organization’s security:

  • Design and enforce hygiene. Ensure that you have the governance in place to drive compliance. This includes controls, standards, and policies for the environment, applications, identity and access management, and elevated access. It’s also critical that standards and policies are continually refined to reflect changes in environments and security threats. Implement governance and compliance to enforce least-privileged access. Monitor resources and applications for ongoing compliance and ensure that your standards remain current as roles evolve.
  • Implement least-privileged access. Using proper RBAC approaches with an emphasis on providing access only to what is absolutely required is the concept of least-privileged access. Add the necessary access controls to remove the need for Global Administrator or Domain Administrator access. Just provide everyone with the access that they truly need. Build your applications, environments, and tools to use RBAC roles, and clearly define what each role can and can’t do.
  • Remove all persistent access. All elevated access should require JIT elevation. It requires an extra step to get temporary secure access before performing elevated-privilege work. Setting persistent access to expire when it’s no longer necessary narrows your exposed attack surface.
  • Provide isolated elevated-privilege credentials. Using an isolated identity substantially reduces the possibility of compromise after a successful phishing attack. Admin accounts without an inbox have no email to phish. Keeping the information-worker credential separate from the elevated-privilege credential reduces the attack surface.

Microsoft Services can help

Customers interested in adopting a defense-in-depth approach to increase their security posture might want to consider implementing Privileged Access Workstations (PAW). PAWs are a key element of the Enhanced Security Administrative Environment (ESAE) reference architecture deployed by the cybersecurity professional services teams at Microsoft to protect customers against cybersecurity attacks.

For more information about engaging Microsoft Services to deploy PAWs or ESAE for your environment, contact your Microsoft representative or visit the Cybersecurity Protection page.

Reaping the rewards

Over the last two years we’ve had an outside security audit expert perform a cyber-essentials-plus certification process. In 2017, the security audit engineers couldn’t run most of their baseline tests because the SAW was so locked down. They said it was the “most secure administrative-client audit they’ve ever completed.” They couldn’t even conduct most of their tests with the SAW’s baseline, locked configuration.

In 2018, the security audit engineer said: “I had no chance; you have done everything right,” and added, “You are so far beyond what any other company in the industry is doing.”

Also, in 2018, our SAW project won a CSO50 Award, which recognizes security projects and initiatives that demonstrate outstanding business value and thought leadership. SAW was commended as an innovative practice and a core element of the network security strategy at Microsoft.

Ultimately, the certifications and awards help validate our defense-in-depth approach. We are building and deploying the correct solutions to support our ongoing commitment to securing Microsoft and our customers’ and partners’ information. It’s a pleasure to see that solution recognized as a leader in the industry.
Related links

We'd like to hear from you!
Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Improving security by protecting elevated-privilege accounts at Microsoft appeared first on Inside Track Blog.

]]>
9774
Building an anti-ransomware program at Microsoft focused on an Optimal Ransomware Resiliency State http://approjects.co.za/?big=insidetrack/blog/building-an-anti-ransomware-program-at-microsoft-focused-on-an-optimal-ransomware-resiliency-state/ Wed, 19 Jun 2024 15:07:43 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=9493 Microsoft strives to deliver the productivity tools and services the world depends on. With this comes the responsibility of ensuring protection, continuity, and resilience from cyberattacks of all sorts—including emerging threats. Highlighted in the third edition of the Microsoft Digital Defense Report, ransomware and extortion are considered nation-level threats due to the sophistication and boldness […]

The post Building an anti-ransomware program at Microsoft focused on an Optimal Ransomware Resiliency State appeared first on Inside Track Blog.

]]>
Microsoft Digital storiesMicrosoft strives to deliver the productivity tools and services the world depends on. With this comes the responsibility of ensuring protection, continuity, and resilience from cyberattacks of all sorts—including emerging threats.

Highlighted in the third edition of the Microsoft Digital Defense Report, ransomware and extortion are considered nation-level threats due to the sophistication and boldness of attacks and their financial impact. No business, organization, or government can be considered safe from the crosshairs of ransomware threat actors. Experts estimate that ransomware’s cost to the world could reach $234 billion within the next decade.

To defend against the evolving ransomware landscape, Microsoft created the Optimal Ransomware Resiliency State (ORRS), a key component of its Ransomware Elimination Program.

This post, the third in our series on ransomware, overviews the concept of ORRS and the steps that you can take to build a ransomware resiliency state of your own.

[Read blog one in our ransomware series: Sharing how Microsoft protects against ransomware. | Read blog two in our ransomware series: Why Microsoft uses a playbook to guard against ransomware.]

What is ORRS?

Optimal Ransomware Resiliency State is the term that the Ransomware Elimination Program team uses to describe our aspiration to defeat ransomware attacks—today and in the future.

Optimal means we’re doing everything we can do—all the ORRS-required capabilities and controls are in place and verified.

—Monty LaRue, principal program manager, Ransomware Elimination Program team

LaRue poses for a portrait photo in front of a wall and plant.
Monty LaRue is the principal program manager on the Ransomware Elimination Program team.

Specifically, ORRS is the outcome of meeting the requirements covering an extensive set of protection and operational capabilities. Built on the foundation of Zero Trust, our ORRS consists of the collection of requirements for training, capabilities, and controls aligned to the NIST Cybersecurity framework and supported by continuously improved processes and practices. These requirements are common across Microsoft’s business, service, and product groups. Their complete implementation produces an organization-wide state of readiness that protects and defends the company and its customers, while also minimizing exposure and increasing resiliency to ransomware attacks.

“Optimal means we’re doing everything we can do—all the ORRS-required capabilities and controls are in place and verified,” says Monty LaRue, the principal program manager on the Ransomware Elimination Program team.

“It’s about achieving that optimal state through the deployment and operationalization of products, like Microsoft Defender for Endpoint for devices, covering our assets, applications, and infrastructure. We consider training and awareness to be a crucial part of ORRS. It’s essential that everyone knows how to recognize threats and how to respond appropriately. Our toolkit includes, incident response plans and playbooks, phishing education and simulation, and other simulation exercises.”

Partnerships are key to producing optimal resiliency

The role of partnerships and teamwork cannot be understated in the development and maintenance of our Optimal Ransomware Resiliency State. The approach must be holistic and cohesive, closing gaps and seams where possible.

Collaboration and open lines of communication with key stakeholders across Microsoft ensure that products and systems with protection needs are accounted for; likewise, Microsoft’s Ransomware team provides requirements to partnering teams to ensure they are equipped and running the latest defensive measures to minimize their attack surface. All involved parties have a deep understanding of their role in keeping the enterprise and our customers safe.

“We’re looking at Microsoft 365, Windows, and Azure,” LaRue says. “We’re looking at the people running MacOS, Linux, and personal devices within Microsoft. If the platforms and foundations follow Zero Trust principles and highly resilient to ransomware attacks, everything built on top shares that benefit.”

The REP team also has close ties to Microsoft’s threat intelligence and research teams, which provide information on the threat landscape and how attackers’ techniques, tactics, and procedures evolve and trend on a regular basis. They also work with internal Security Operation Centers (SOCs), which monitor threat actors and provide insights via attack data and post-mortems.

The more you prevent and protect, the less you have to respond and recover. The further you are in an attack sequence, the more complex and expensive it is to respond and recover.

—Monty LaRue, principal program manager, Ransomware Elimination Program team

Maintaining our Optimal Ransomware Resiliency State also involves using existing technology, such as Microsoft Defender suite, with a continuous improvement approach to take advantage of their latest capabilities and threat information. Learnings and insights from the ransomware program team flow back to the product and engineering teams in the form of enhancements or new requirements and features, helping to further improve our commercial products and services. One example of this is the detection of abnormal file activities, such as encryption or exfiltration, for data stores and backups in commercial services such as OneDrive, SharePoint, and Microsoft Azure which extends beyond Microsoft’s walls to protect all customers.

The practice of continuous improvement is also applied to the response procedures that make up the ransomware incident response playbook. Tabletop exercises based on new threats and information help to uncover gaps in response procedures, while simulations stress test the response system to ensure the involved security professionals have response readiness excellence should an attack ever breach our protective capabilities and controls.

Our commitment to company-wide alignment reduces the risk of a successful attack and the chance of a resulting payoff. “The more you prevent and protect, the less you have to respond and recover,” LaRue says. “The further you are in an attack sequence, the more complex and expensive it is to respond and recover.”

Building toward an optimal state

As we’ve seen throughout this series, ransomware is evolving and attackers are opportunistic. The goalposts for protection continue to shift, and ransomware’s impact on the world shows no signs of slowing. Because of this, there is no universal optimal resiliency state. Every organization’s situation is unique, from level of exposure to threats, to capabilities and services deployed, to protection needs, so every organization’s optimal state must be tailored to their business and risk tolerances.

“The Optimal Ransomware Resiliency State means different things to each organization, it’s different depending on whether your systems are physical, in the cloud, or hybrid, if you provide high availability services or large data stores, and if you work with highly confidential or sensitive data in regulated environments,” LaRue says.

The task of building an optimal ransomware resiliency state begins with a comprehensive inventory of the current state—and that means asking a lot of questions and doing verifications. Start with an understanding of which business-critical systems and services across the organization must be defended and why. It also means understanding the systems themselves, their dependencies, which configurations and controls are enabled, as well as the state of existing ransomware readiness capabilities. Such an inventory can shed light on high-value targets and the unforeseen risks to them exposing potential weaknesses and highlighting strengths.

The process of establishing your current state is insightful and has the potential to be humbling, but it encourages taking the next steps in developing your ORRS roadmap. This may include investments in training for response readiness or new technologies to reduce attack surface risk, but all optimal resiliency states require implementing a continuous improvement process to keep the organization and those that depend on it safe now and in the future.

Microsoft’s investment in the Ransomware Elimination Program highlights our commitment to defeating successful ransomware attacks. Establishing our ORRS provides us with learnings and guides us to improving our security posture, which helps the company produce secure and dependable products and services.

Ransomware may be one of the biggest security threats to your organization. Taking up the challenge to develop your own ransomware resiliency state will put you on a path forward to protecting and defending what matters most.

Key Takeaways

  • You will define optimal for your organization, but attackers will always be looking for new avenues. You must be able to shift focus and update ORRS quickly to match the threat and attacker’s agility.
  • Ransomware elimination starts with a shared understanding, frameworks e.g., Zero Trust, and defining your ORRS. Core protections such as MFA, pervasive backups, comprehensive telemetry and alerts, as part of a holistic, cohesive effort that spans devices and services are crucial in responding to cyberthreats like ransomware.
  • Implementing tamper-resistant security capabilities and controls, and attack surface reductions reduces your malware related risks.
  • Understanding the right investments is difficult, especially when threats and attackers are moving fast. Engage early and often within your organization to understand your assets, risks, and state as you define your ORRS and implement capabilities, controls, processes, and practices.

Related links

We'd like to hear from you!
Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Building an anti-ransomware program at Microsoft focused on an Optimal Ransomware Resiliency State appeared first on Inside Track Blog.

]]>
9493
Creating a manageable Microsoft Azure subscription model http://approjects.co.za/?big=insidetrack/blog/creating-a-manageable-microsoft-azure-subscription-model/ Thu, 06 Jun 2024 21:15:39 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=15103 Editor’s note: This story was written by a bot powered by Microsoft Azure OpenAI. The bot interviews subject matter experts in Microsoft Digital to generate new stories quickly. We have humans in the loop to ensure the accuracy and completeness of our AI-powered stories. In the rapidly evolving world of cloud services, managing technical subscriptions […]

The post Creating a manageable Microsoft Azure subscription model appeared first on Inside Track Blog.

]]>
Inside Track bot storyEditor’s note: This story was written by a bot powered by Microsoft Azure OpenAI. The bot interviews subject matter experts in Microsoft Digital to generate new stories quickly. We have humans in the loop to ensure the accuracy and completeness of our AI-powered stories.

In the rapidly evolving world of cloud services, managing technical subscriptions can become a daunting task. At Microsoft, we faced a similar challenge—Microsoft Azure subscription sprawl.

“If customers don’t have a formal system in place to manage their Azure subscriptions, it can lead to subscription sprawl,” says Trey Morgan, a principal product manager on our Microsoft Digital Azure Optimization team. “This can cause potential legal and security risks.”

Our solution?

The Azure Information Request System (AIRS).

The impact of AIRS has been significant, particularly in governance and cost management. By assigning subscriptions to the business hierarchy from day one, they don’t get lost in a company of our size. We can quickly identify who to contact for security issues, cost issues, and understand how these cloud resources fit into Microsoft’s business.

— Trey Morgan, principal product manager, Microsoft Digital Azure Optimization team

AIRS streamlines the process of setting up new Azure subscriptions.

Portrait photo of Morgan.
Trey Morgan is a principal product manager on our Microsoft Digital Azure Optimization team.

“AIRS is an internal system we’ve developed that offers a solution to govern and track subscriptions, a strategy that Microsoft has effectively used,” Morgan says.

Users requesting a new subscription fill out a form detailing cost assignment and ownership. The system also helps assign the subscription to our business hierarchy, providing visibility on where the cloud resources fit within the company.

“The impact of AIRS has been significant, particularly in governance and cost management,” Morgan says. “By assigning subscriptions to the business hierarchy from day one, they don’t get lost in a company of our size. We can quickly identify who to contact for security issues, cost issues, and understand how these cloud resources fit into Microsoft’s business.”

We’ve also integrated AIRS with tooling that benefits several different parts of our business.

“Azure governance, security, finance, and leadership all benefit from AIRS,” Morgan says. “Without it, we would lack crucial information about these Azure subscriptions or why they exist.”

Azure subscription sprawl strategies

To prevent subscription sprawl in your Azure environment, consider implementing the following strategies:

  • Consistent landing zones: Establish consistent landing zones based on application archetype subscription strategies. This approach minimizes the growth of subscriptions by providing predefined structures for different types of workloads.
  • Requisite components definition: Expand the definition of requisite components to better align with the governance and compliance needs of a mature cloud enterprise. Clearly define what components are necessary for each subscription, ensuring that they meet organizational standards.
  • Subscription policies: Control the movement of Azure subscriptions out of the current directory and into it. Global administrators can allow or disallow users from changing the directory of an Azure subscription. For specific scenarios, configure a list of exempted users who can bypass the policy settings that apply to everyone else.
  • Restrict self-service subscriptions: Disable self-service purchases to prevent standard users from creating subscriptions without proper authorization.

Remember that effective governance and clear policies play a crucial role in managing subscription sprawl and maintaining a well-organized Azure environment. As we continue to evolve and improve AIRS, we hope our journey can provide valuable insights for other companies navigating their own cloud subscription challenges.

The future of AIRS

Having a company operating model and policies is effective and IT leaders need to adhere to them and regularly review cloud subscriptions and usage to use them for the greatest security, flexibility, and output value. As we look to the future, we’re confident that AIRS will continue to evolve and provide even greater benefits to the company.

Key Takeaways

Consider using a system like AIRS to streamline the process of setting up new Azure subscriptions and assign them to the business hierarchy. Here are some tips on how you can get started at your company:

  • Establish consistent landing zones based on application archetype subscription strategies to minimize the growth of subscriptions.
  • Expand the definition of requisite components to align with the governance and compliance needs of a mature cloud enterprise.
  • Control the movement of Azure subscriptions in and out of the current directory by setting subscription policies.
  • Disable self-service purchases to prevent standard users from creating subscriptions without proper authorization.
  • Remember that effective governance and clear policies play a crucial role in managing subscription sprawl and maintaining a well-organized Azure environment.

Try it out

Create your Microsoft Azure free account today.

Related links

We'd like to hear from you!

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Creating a manageable Microsoft Azure subscription model appeared first on Inside Track Blog.

]]>
15103
Empowering our employees with generative AI while keeping the company secure http://approjects.co.za/?big=insidetrack/blog/empowering-our-employees-with-generative-ai-while-keeping-the-company-secure/ Thu, 30 May 2024 23:45:48 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=15012 Generative AI (GenAI) is rapidly changing the way businesses operate, and everyone wants to be in on the action. Whether it’s to automate tasks or enhance efficiency, the allure of what GenAI can do is strong. However, for companies considering the adoption of GenAI, there are a multitude of challenges and risks that must be […]

The post Empowering our employees with generative AI while keeping the company secure appeared first on Inside Track Blog.

]]>
Generative AI (GenAI) is rapidly changing the way businesses operate, and everyone wants to be in on the action. Whether it’s to automate tasks or enhance efficiency, the allure of what GenAI can do is strong.

However, for companies considering the adoption of GenAI, there are a multitude of challenges and risks that must be navigated. These range from data exposure or exfiltration where your company’s sensitive data can be accessed by unintended audiences to direct attacks on the models and data sources that underpin them. Not acting and waiting until the world of GenAI settles down poses its own risk. Employees eager to try out the latest and greatest will start using GenAI tools and products that aren’t vetted for use in your enterprise’s environment. It’s safe to say that we’re not just in the era of Shadow IT but Shadow AI, too.

Add to that the fact that threat actors have begun to use these tools in their activities, and you get a real sense that navigating the cyberthreat landscape of today and tomorrow will be increasingly difficult—and potentially headache-inducing!

Here at Microsoft, our Digital Security & Resilience (DSR) organization’s Securing Generative AI program has focused on solving this problem since day one: How do we enable our employees to take advantage of the next generation of tools and technologies that enable them to be productive, while maintaining safety and security?

Building a framework for using GenAI securely

At any given moment, there are dozens of teams working on GenAI projects across Microsoft and dozens of new AI tools that employees are eager and excited to use to boost their productivity or use to be more creative.

When establishing our Securing AI program, we wanted to use as many of our existing systems and structures for the development, implementation, and release of software within Microsoft as possible. Rather than start from scratch, we looked at processes and workstreams that were already established and familiar for our employees and worked to integrate AI rules and guidance into those processes, such as the Security Development Lifecycle (SDL), and the Responsible AI Impact Assessment template.

Successfully managing the secure roll-out of a technology of this scale and importance takes the collaboration and cooperation of hundreds of people across the company, with representatives from diverse disciplines ranging from engineers and researchers working on the cutting edge of AI technology, to compliance and legal specialists, through to privacy advocates.

Portraits of Roy, Peterson, Enjeti, and Sharma are included together in a collage.
Justin Roy, Lee Peterson, Prathiba Enjeti, and Vivek Vinod Sharma are part of a team at Microsoft working to keep the company secure while allowing our employees to get the most out of GenAI.

We work extensively with our partners in Microsoft Security, Aether (AI Ethics and Effects in Engineering and Research), the advisory body for Microsoft leadership on AI ethics and effects, and the extended community of Responsible AI. We also work with security champions who are embedded in teams and divisions across the enterprise. Together, this extended community helps develop, test, and validate the guidance and rules that AI experiences must adhere to for our employees to safely use them.

One of the most popular frameworks for successful change management is the simple three-legged stool. It’s a simple metaphor, emphasizing the need for even efforts across the domains of technology, processes, and people. We’ve focused our efforts to secure GenAI on strengthening and reinforcing the data governance for our technologies, integrating AI security into existing systems and processes, and addressing the human factor by fostering collaboration and community with our employees. The recent announcement of the Secure Future Initiative with its six security pillars emphasizes security as a top priority across the company to advance cybersecurity protections.

Incorporating AI-focused security into existing development and release practices

The SDL has been central to our development and release cycle at Microsoft for more than a decade, ensuring that what we develop is secure by design, by default, and secure in deployment. We focused on strengthening the SDL to handle the security risks posed by the technology underlying GenAI.

We’ve worked to enhance embedded security requirements for AI, particularly in monitoring and threat detection. Mandating audit logging at the platform level for all systems provides visibility into which resources are accessed, which models are used, and the type and sensitivity of the data accessed during interactions with our various Copilot offerings. This is crucial for all AI systems, including large language models (LLMs), small language models (SLMs), and multimodal models (MMMs) that focus on partial or total task completion.

Preventative measures are an equally important part of our journey to securing GenAI, and there’s no shortage of work that’s been done on this front. Our threat modeling standards and red teaming for GenAI systems have been revamped to help engineers and developers consider threats and vulnerabilities tied to AI. All systems involving GenAI must go through this process before being deployed to our data tenant for our employees to use. Our standards are under constant review and are updated based on the discoveries from our researchers and the Microsoft Security Response Center.

Sharing our acceptance criteria for AI systems

As GenAI and the types of risks and threats to models and systems are ever evolving, so too is our acceptance criteria for deploying AI to the enterprise. Here are some of the key points we take into consideration for our acceptance criteria:

Representatives from diverse disciplines: Our journey begins when a diverse team of experts. engineers, compliance teams, security SMEs, privacy advocates, and legal minds come together. Their collective wisdom ensures a holistic perspective.

Evaluate against enterprise standards: Every GenAI feature is subjected to rigorous scrutiny against our enterprise standards. This isn’t a rubber-stamp exercise, it’s a deep dive into ethical considerations, potential security, privacy, and AI risks, and alignment with the Responsible AI standard.

Risk assessment and management: The risk workflow starts in our system to amplify risk awareness and management across leadership teams. It’s more than a formality, it’s a structured process that keeps us accountable. Risks evolve, and so do our mitigation strategies, which is why we revisit the risk assessment of a feature every three to six months. Our assessments are a living guide that adapts to the landscape.

Phased deployment to companywide impact: We used a phased deployment to allow us to monitor, learn, and fine-tune.

Risk contingency planning: This isn’t about avoiding risks altogether; it’s about managing them. By addressing concerns upfront, we ensure that GenAI deployment is safe, secure, and aligned with our values.

By integrating AI into these existing processes and systems, we help ensure that our people are thinking about the potential risks and liabilities involved in GenAI throughout the development and release cycle—not only after a security event has occurred.

Improving data governance

While keeping Gen-AI models and AI systems safe from threats and harms is a top priority, this alone is insufficient for us to consider GenAI as secure and safe. We also see data governance as essential to prevent improper access, improper use, and to reduce the chance of data exfiltration—accidental or otherwise.

Graphic showing the elements of GenAI security governance, including discovering risk, protecting apps, and governing usage.
Discovery, protection, and governance are key elements to protecting the company while enabling our employees to take advantage of GenAI.

At the heart of our data governance strategy is a multi-part expansion of our labeling and classification efforts, which applies at both the model level and the user level.

We set default labels across our platforms and the containers that store them using Purview Information Protection to ensure consistent and accurate tagging of sensitive data by default. We also employ auto-labeling policies where appropriate for confidential or highly confidential documents based on the information they contain. Data hygiene is an essential part of this framework; removing outdated records held in containers such as SharePoint reduces the risk of hallucinations or surfacing incorrect information and is something we reinforce through periodic attestation.

To prevent data exfiltration, we rely on our Purview Data Loss Prevention (DLP) policies to identify sensitive information types and automatically apply the appropriate policies at the controls at the application or service level (e.g. Microsoft 365), and Defender for Cloud Apps (DCA) to detect the use of risky websites and applications, and if necessary, block access to them. By combining these methods, we’re able to reduce the risk of sensitive data leaving our corporate perimeter—accidentally or otherwise.

Encouraging deep collaboration and sharing of best practices

So far, we’ve covered the management of GenAI technologies and how we ensure that these tools are safe and secure to use. Now it’s time to turn our attention to our people, the employees who work with and build with these GenAI systems.

We believe that anyone should be able to use GenAI tools confidently, knowing that they are safe and secure. But doing so requires essential knowledge, which might not be entirely self-evident. We’ve taken a three-pronged approach to solving this need with training, purpose-made resource materials, and opportunities for our people to develop their skills.

All employees and contract staff working at Microsoft must take our three-part mandatory companywide security training released throughout the year. The safe use of GenAI is comprehensively covered, including guidance on what AI tools to use and when to use them. Additionally, we’ve added extensive guidance and documentation to our internal digital security portal ranging from what to be mindful of when working with LLMs to the tools which are best suited to various tasks and projects.

With so many of our employees wanting to learn how to use GenAI tools, we’ve worked with teams across the company to create resources and venues where our employees can roll up their sleeves and work with AI hands-on in a way that’s safe and secure. Hackathons are a big deal at Microsoft, and we’ve partnered with several events including the main flagship event, which draws in more than 50,000 attendees. The Skill-Up AI presentation series hosted by our partners at the Microsoft Garage allows curious employees to learn the safe and secure way to use the latest GenAI technologies not only in their everyday work, but also in their creative endeavors. By integrating guidance into the learning journey, we help enable safe use of GenAI without stifling creativity.

Key Takeaways

Here are our suggestions on how to empower your employees with GenAI while also keeping your company secure:

  • Understand the challenges and risks associated with adopting GenAI technology at your company. Good places to start are assessing the potential for data exposure, direct attacks on models and data sources, and the risks associated with Shadow AI.
  • Develop resources and guidance for your employees to educate them on the risks of using AI. Fostering collaboration and a strong community in support of secure use of GenAI.
  • If applicable, incorporate AI-focused security into existing development and release practices. Check out the Security Development Lifecycle (SDL) and the Responsible AI Impact Assessment template for inspiration.
  • Work to bolster your data governance policies. We strongly recommend starting with labeling and classification efforts, employing auto-labeling policies, and improving data hygiene. Consider tools such as Purview Data Loss Prevention (DLP) and Defender for Cloud Apps to prevent data exfiltration and limit improper data access.

Try it out

Learn more about our overall approach to GenAI governance internally here at Microsoft.

Related links

We'd like to hear from you!
Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Empowering our employees with generative AI while keeping the company secure appeared first on Inside Track Blog.

]]>
15012
Modernizing our Network Access Control Infrastructure with Azure http://approjects.co.za/?big=insidetrack/blog/modernizing-our-network-access-control-infrastructure-with-azure/ Wed, 08 May 2024 18:55:01 +0000 http://approjects.co.za/?big=insidetrack/blog/?p=14834 “How do I get on the Wi-Fi?” It’s often one of the first questions visitors ask when arriving at a new location, and here at Microsoft, it happens every day. Our guest Wi-Fi network is typically one of the first services used by visitors at Microsoft. We’re continually evaluating and improving guest Wi-Fi services to […]

The post Modernizing our Network Access Control Infrastructure with Azure appeared first on Inside Track Blog.

]]>
Microsoft Digital technical stories“How do I get on the Wi-Fi?”

It’s often one of the first questions visitors ask when arriving at a new location, and here at Microsoft, it happens every day.

Our guest Wi-Fi network is typically one of the first services used by visitors at Microsoft. We’re continually evaluating and improving guest Wi-Fi services to increase service resiliency, simplify the registration process, and create seamless connectivity for various device types.

Our network access control (NAC) solution authenticates devices that connect to the guest Wi-Fi service and ensures that these devices are placed in the appropriate network location. Every device—corporate, guest, or Internet of Things (IoT)—must use the NAC solution to gain access to the network.

Like many organizations, our NAC solution has been hosted by on-premises components and infrastructure for years. However, with the end of support looming for our previous NAC solution, we’ve recently migrated NAC for our guest Wi-Fi to the cloud on Microsoft Azure.

Our previous solution presented several challenges to running a robust, resilient, and highly available guest Wi-Fi service that continuously met the requirements and demands of our guest users. Those challenges included:

  • Costly infrastructure: All hardware fails eventually. These failures forced ongoing maintenance, and our design teams needed to deploy redundant components to ensure service continuity and adequate performance. The effort to perform maintenance—including returning and ordering hardware, implementing failover protocols, and tracking processes—was expensive.
  • Lack of agility: Planned work windows on upstream infrastructure sometimes required us to implement temporary changes or workarounds to keep the service running. This restricted the time when partner teams could perform upgrades or make configuration changes, slowing everyone down.
  • Lack of regional resiliency: We needed to schedule software upgrades per region without a cross-regional load-balancing solution for the Wi-Fi registration portal. This required coordination with multiple partner teams around the world.
  • Lack of scalability: The high cost of redundancy for physical infrastructure made it too expensive to maintain a scalable solution across all regions. Hosting a NAC solution for every office or data center with Wi-Fi was cost-prohibitive, so redundancy and scalability were problems for some regions.

Hosting NAC services in Microsoft Azure

We assessed the challenges alongside potential solutions for a new NAC solution and decided to host our NAC services in Microsoft Azure.

NAC services are, ultimately, software-defined networking, and we took the opportunity to distribute our NAC services in Azure, where cloud networking and infrastructure as code allow us to radically improve scalability, resiliency, and agility for our NAC services. We’re hosting NAC services across four Azure global regions for redundancy, high availability, and performance purposes.

We’re hosting NAC services in pre-configured images from the Azure Marketplace. These images can be deployed, redeployed, or upgraded using Azure Resource Manager (ARM) templates, significantly reducing maintenance efforts. The images are also hosted in natively resilient Azure virtual machine architecture, creating cost-effective scalability and redundancy.

Azure ExpressRoute connects our on-premises Wi-Fi networks to the cloud-hosted NAC services in Azure. Using the Azure backbone network, we still achieve high performance and low-latency connectivity between our Wi-Fi networks and the cloud-hosted NAC services. We’re using redundant configuration and peering for our ExpressRoute connections, creating instance scalability and resiliency benefits. We can easily perform maintenance on any on-premises router or ExpressRoute configurations without impacting service availability.

Azure Traffic Manager allows us to standardize our guest services across hundreds of Wi-Fi controllers globally with a single, captive portal URL that ensures all users reach the nearest NAC instance in Azure. Azure Traffic Manager’s built-in redundancy ensures that traffic is re-routed seamlessly to a secondary Azure region if a regional outage occurs.

Azure Load Balancer handles the distribution and monitoring of NAC RADIUS traffic across multiple NAC virtual machine instances in a region. Azure Load Balancer monitors NAC component health and ensures that RADIUS requests are routed only to the NAC virtual machines ready to receive incoming requests. Defining a health check on the upstream load balancer removes the requirement for network devices to retry and failover individually, reducing end users’ authentication time.

Azure Backup allows us to seamlessly manage business continuity and disaster recovery for each NAC virtual machine. We use virtual hard disk snapshots to capture the virtual machine state and enable rapid, integrated recovery for any NAC virtual machine, regardless of region.

Azure Serial Console provides a reliable connection to the Azure command-line interface for each virtual machine, even when network configuration or issues prevent connection using the IP address.

Transitioning NAC services seamlessly to the cloud

Our migration process required extensive planning to ensure our NAC services remained available throughout the migration process. We operated the previous on-premises NAC service and the new cloud-based NAC service in Azure in parallel and migrated individual buildings over in a phased approach. This allowed us to review telemetry and collect feedback from a subset of migrated sites before progressing to the next set of buildings, reducing the risk of widespread impact if we experienced an unforeseen issue or anomaly.

Developing quality network automation was critical to this project’s success. We created an Ansible playbook that evaluated the on-premises service configuration on 900 wireless controllers. The playbook only applied the updated configuration if the on-premises configuration was standard. This helped us avoid negative impact on sites that had custom, site-specific guest service configurations. We also used this playbook to validate RADIUS connectivity to the new NAC servers before applying the final configuration to ensure all access control lists were correctly defined for each site.

Looking forward

Migrating our NAC infrastructure to Azure has significantly improved the resiliency and manageability of the guest Wi-Fi service at Microsoft. We can now focus our time on developing and enhancing the underlying NAC service on Azure.

Azure allows scaling NAC services up or down without downtime and planning for hardware considerations and issues. With Azure, we have increased telemetry and disaster recovery capabilities for quicker incident detection and remediation.

We’ve transitioned our NAC services out of the datacenter, removing the need for additional hardware and extensive maintenance efforts, and our migration has been entirely transparent for users.

Our Azure-hosted NAC services have already saved us hundreds of engineering hours, allowing us to focus more on providing an excellent client experience on our guest and corporate networks. We’ll continue to improve NAC and guest Wi-Fi services at Microsoft, ensuring guests visiting Microsoft have access to a reliable and seamless Wi-Fi network in every Microsoft location across the globe.

Key Takeaways

Consider the following takeaways when assessing your organization’s potential for migrating network services to Microsoft Azure.

  • Embrace the cloud for NAC: Move Network Access Control (NAC) services to the cloud with Microsoft Azure to improve scalability, resiliency, and agility while reducing maintenance efforts and costs.
  • Simplify with software-defined networking: Transitioning to software-defined networking in Azure enables you to use pre-configured images and templates, enhancing redundancy and performance across global regions.
  • Enhance connectivity with Azure ExpressRoute: Use Azure ExpressRoute to connect on-premises networks to cloud-hosted services seamlessly, ensuring high performance and low latency without compromising service availability during maintenance.
  • Optimize with Azure Load Balancer and Backup: Azure Load Balancer ensures optimal distribution and health of Azure services, while Azure Backup offers robust business continuity and disaster recovery options.

Try it out

Learn how to create a public load balancer to load balance VMs using an ARM template.

Related links

We'd like to hear from you!

Want more information? Email us and include a link to this story and we’ll get back to you.

Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

The post Modernizing our Network Access Control Infrastructure with Azure appeared first on Inside Track Blog.

]]>
14834