Reliability and resilience News and Insights | Microsoft Security Blog

The importance of identity and Microsoft Azure Active Directory resilience

Joy Chik — Tue, 16 Nov 2021 17:00:12 +0000

I love hearing my colleagues explain how they came to the industry because so many of their stories are unusual. I’m surprised how often I hear that people got into computer science by some fortuitous accident. Although he loved computers from the time he was a kid, Oren Melzer never expected to work in the software industry. Today, he’s a Principal Group Engineering Manager in the Identity and Network Access organization, working on one of our team’s most important efforts: resilience.

When he was growing up, Oren’s business-minded parents encouraged him to develop an entrepreneurial spirit. And he did. Oren’s journey reminds us that entrepreneurship isn’t limited to building a new business from scratch, where you start off doing everything yourself. Even though he’s worked in a large organization in a large company for the past several years, Oren has enjoyed participating in many entrepreneurial efforts, including his groundbreaking work on making cloud services resilient, as he tells Nadim Abdo, Corporate Vice President of Identity and Network Access Engineering.

Oren’s interview with Nadim has been edited for clarity and length. We’ve included some video snippets so you can learn more about Oren’s personal journey and his views on the work he does.

Nadim: Oren, I’d like to start by asking what got you into the industry and computers?

Oren: It all started when I was really young. My parents were immigrants from Israel to Louisville, Kentucky. Not a ton of Israelis in Kentucky! My dad was an engineer, so we had computers super early. I’m dating myself, but we had a Commodore that ran Microsoft Disk Operating System (MS-DOS). I was probably five or six, tinkering around on that thing. When my dad showed me Quick Basic interpreter (QBasic), I created a simple little program that would ask, “What is your name?” And you’d say, “Oren.” And it would say, “Hello, Oren.” I remember thinking, “That’s the coolest thing in the world. I can make a computer program that I can talk to!” I loved doing stuff on computers from then on.

Nadim: I have fond memories of QBasic as well. That integrated development environment (IDE) and debugger were pretty awesome. So, you wrote programs from an early age—do you remember any other programs you did?

Oren: Three months after I was born, my parents started a food manufacturing company, which they still run. It’s a family business, so after a few years, they put me to work. But they realized pretty quickly that this computer thing was probably more useful than me putting cans in boxes. So, I became the company computer guy.

They had software to do all their accounting and inventory, and there was a production planning module that cost $40,000. They asked me what I thought, and I said that, with what I knew, I could write it for a lot less money. I was a high schooler, and they basically threw me into this problem. I didn’t have anybody to tell me what to do or how to do it. I wrote a bunch of Visual Basic macros that pulled data from the system, pulled up some editable forms, and then popped out a production plan. That was an entire summer project, 20-plus years ago, and their company still runs on that software to this day. I actually still get tech support calls to fix random bugs.

Nadim: That’s amazing! You must’ve learned the value of customer obsession from that experience. And obviously, this segues to how you now work on some of the most critical services in the entire industry. What learnings from that experience really carried through?

Oren: First, you have to build something that works. I wrote this software when I was 16 or 17 years old, and if it breaks, they can’t produce—30 or 40 people on a factory floor don’t know what to run or they’re scrambling to try to do the same thing manually.

I didn’t know about source control then, but I learned early on to make a backup copy when making changes. If something broke, I’d copy in the one from yesterday that worked. And there’d be weird edge cases, like some new item that the string was too long to fit into how many characters I assumed it could be. So, I learned to be very fault-tolerant, catch errors, and keep on going.

Nadim: When you went to college, what did you choose as your focus?

Oren: I was convinced that software was something everybody was doing. And I like to do things that other people aren’t doing. So, I went into college as a biomedical engineering major. I really wanted to combine the computer thing with biology, another passion I had in high school. I wanted to build medical devices and software for medical devices, pacemakers, and so forth.

A couple of things got me into software. Early on, I met another computer science major, and he became a good friend. He’s actually at Microsoft now. We started a book business together, which we wrote software for.

For a while, I actually thought this thing could be my career, but during our downtime one summer, I looked for a biomedical internship. I couldn’t find one, but who showed up at our company fair? Microsoft. I had my first internship in the identity organization after that. I loved it so much I changed my major. I ended up getting a master’s in computer science and came to Microsoft full-time. I’ve been in identity ever since.

Nadim: That’s wonderful! What do you like best about working in the identity space?

Oren: What’s cool about identity is how foundational it is, like the electric company. Very few people wake up in the morning and say, “I want to use my identity today.” But whatever you do want to do—when you look at all the Microsoft products and applications at any number of businesses—the very first question you always need to ask is, who are you? What is your identity?

Identity enables all those experiences. And when it doesn’t work, people can’t work. I tell people, “I challenge you to find another job where you can impact more people in a day than our identity system does.” We throw around numbers like “billions of authentications” like it’s nothing. That level of impact—that level of making a difference for practically every working person, and many people in college, all over the world—is practically unmatched anywhere else at Microsoft or in the industry, as far as I know.

Nadim: That’s right. The scale is certainly incredible, as is the criticality and security. With that kind of scale, there are obviously enormous technical challenges. And you’ve worked on a number of different areas within identity, right?

Oren: I started on a product called Windows CardSpace, formerly known as InfoCard. It was an identity selector in Windows, where somebody could issue you an identity to use online. To some extent, we were ahead of our time, and eventually, that project was shelved. I moved to developer frameworks and worked on Windows Identity Foundation, which became part of the Microsoft .NET Framework. I also worked on Active Directory Federation Services (AD FS).

My first entry into cloud services was the Access Control Service, which allowed admins to configure federated authentication for their apps. You could authenticate using Microsoft accounts and Google accounts and also secure your application. It was one of the identity organization’s first modern services. And it was really interesting to move from shipping software in a box, which people can download or not, to shipping something that runs all the time and is critical to day-to-day life.

Nadim: And certainly, an absolutely critical journey as part of cloud transformation with everybody using these services. Tell me about your role and what you like best about it?

Oren: I now own an area called “authentication resilience” in identity. We could build the best services in the world, with the most features, but if they’re not up all day, every day, we’re basically failing our customers. And the impact of that is enormous. We’ve learned hard lessons over the years on what can go wrong in a distributed system, so we’ve developed systems that enable us to operate, and continue to operate, in case all kinds of outages occur, whether from networking problems somewhere in Microsoft Azure, a bug that gets released in our system, or key management problems.

We’re building, number one, a set of components to ensure that if the core identity system goes down, users won’t notice. We do that by allowing sessions to live longer, while also being more secure, and to react in real-time. Secondly, we built an entire decorrelated backup authentication stack where we can continue to serve authentications even if the primary system goes down completely. The vast majority of users can stay productive and have no idea that anything has gone wrong.

The goal is to prevent the outage from happening, but if a partial outage does occur, to minimize the impact.

Nadim: How would you say that Microsoft is differentiating our offerings in terms of resilience?

Oren: When we started on this resilience journey a couple of years ago, we weren’t aware of any cross-industry efforts on service resilience. Existing identity standards just assume everything is going to work. With OAuth and security assertion markup language (SAML), you make a request, you get a response. There was no playbook or roadmap for figuring out how to build the next level of real-time signals, more resilience, or backup systems. We weren’t going to wait for one, so we just built it. Ultimately, a working group formed in the OpenID Foundation called Shared Signals and Events, and we actively participated. I went to many of those early meetings, trying to figure out how to build a real-time resilient identity system.

It’s one thing to talk about theory. It’s another to say, “We’ve built this already. Here’s what it looks like.” As a big believer in open standards, I’m proud that we didn’t just say, “The standard must be exactly like what we built, otherwise we’re not going to be on it.” We have actually adapted our implementation to the industry standard. And we’ve been able to get our partners elsewhere in the industry—people who build other software that works with Microsoft Azure Active Directory (Azure AD)—to adopt this standard as well. Now we can say that we have resilience and continuous access, not just for Microsoft properties, but also for many other long-tail apps, built by other people, that we know our customers rely on every day.

Nadim: One of the things that’s awesome about our team is we have so many different individuals with so much talent, with different interests, passions, and ways of looking at the world. How would you describe yourself, your approach, and your strengths?

Oren: People think of software engineers hunched over in a dark room in front of a desk, pounding on a keyboard, looking at ones and zeros on a screen. I like code as much as anybody, but I am a people person. I really thrive on human interaction, on enabling somebody to be successful, and on finding the right project for someone working for me who may be struggling a bit.

The same is true when I think about the impact of the software we build. I don’t just think about the billion requests our backup systems serve today. I think about a billion people who might’ve been frustrated because they couldn’t check their email. And now they can because this backup system kicked in. What motivates me is the people—both the ones I can see in the office and the ones I can’t see. I know they’re there. Knowing that the work I do can make a difference for those people, both in terms of the technology I build and of the people I manage, is extremely motivational for me.

Learn more

Learn more about cloud resilience.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post The importance of identity and Microsoft Azure Active Directory resilience appeared first on Microsoft Security Blog.

Becoming resilient by understanding cybersecurity risks: Part 3—a security pro’s perspective

Mark Simos and Sarah Armstrong-Smith — Wed, 24 Feb 2021 17:00:04 +0000

In part two of this blog series on aligning security with business objectives and risk, we explored the importance of thinking and acting holistically, using the example of human-operated ransomware, which threatens every organization in every industry. As we exited 2020, the Solorigate attack highlighted how attackers are continuously evolving. These nation-state threat actors used an organization’s software supply chain against them, with the attackers compromising legitimate software and applications with malware that installed into target organizations.

In part three of this series, we will further explore what it takes for security leaders to pivot their program from looking at their mission as purely defending against technical attacks to one that focuses on protecting valuable business assets, data, and applications. This pivot will enable business and cybersecurity leaders to remain better aligned and more resilient to a broader spectrum of attack vectors and attacker motivations.

What problem do we face?

First, let’s set a quick baseline on the characteristics of human-operated cyberattacks.

This diagram depicts commonalities and differences between for-profit ransomware and espionage campaigns:

Figure 1: Comparison of human-operated attack campaigns.

Typically, the attackers are:

Flexible: Utilize more than one attack vector to gain entry to the network.
Objective driven: Achieve a defined purpose from accessing your environment. This could be specific to your people, data, or applications, but you may also just fit a class of targets like “a profitable company that is likely to pay to restore access to their data and systems.”
Stealthy: Take precautions to remove evidence or obfuscate their tracks (though at different investment and priority levels, see figure one)
Patient: Take time to perform reconnaissance to understand the infrastructure and business environment.
Well-resourced and skilled in the technologies they are targeting (though the depth of skill can vary).
Experienced: They use established techniques and tools to gain elevated privileges to access or control different aspects of the estate (which grants them the privileges they need to fulfill their objective).

There are variations in the attack style depending on the motivation and objective, but the core methodology is the same. In some ways, this is analogous to the difference between a modern electric car versus a “Mad Max” style vehicle assembled from whatever spare parts were readily and cheaply available.

What to do about it?

Because human attackers are adaptable, a static technology-focused strategy won’t provide the flexibility and agility you need to keep up with (and get ahead of) these attacks. Historically, cybersecurity has tended to focus on the infrastructure, networks, and devices—without necessarily understanding how these technical elements correlate to business objectives and risk.

By understanding the value of information as a business asset, we can take concerted action to prevent compromise and limit risk exposure. Take email, for example, every employee in the company typically uses it, and the majority of communications have limited value to attackers. However, it also contains potentially highly sensitive and legally privileged information (which is why email is often the ultimate target of many sophisticated attacks). Categorizing email through only a technical lens would incorrectly categorize email as either a high-value asset (correct for those few very important items, but impossible to scale) or a low-value asset (correct for most items, but misses the “crown” jewels in email).

Figure 2: Business-centric security.

Security leaders must step back from the technical lens, learn what assets and data are important to business leaders, and prioritize how teams spend their time, attention, and budget through the lens of business importance. The technical lens will be re-applied as the security, and IT teams work through solutions, but looking at this only as a technology problem runs a high risk of solving the wrong problems.

It is a journey to fully understand how business value translates to technical assets, but it’s critical to get started and make this a top priority to end the eternal game of ‘whack-a-mole’ that security plays today.

Security leaders should focus on enabling this transformation by:

Aligning the business in a two-way relationship:

Communicate in their language: explain security threats in business-friendly language and terminology that helps to quantify the risk and impact to the overall business strategy and mission.
Participate in active listening and learning: talk to people across the business to understand the important business services and information and the impact if that were compromised or breached. This will provide clear insight into prioritizing the investment in policies, standards, training, and security controls.

Translating learnings about business priorities and risks into concrete and sustainable actions:

Short term focus on dealing with burning priorities:
- Protecting critical assets and high-value information with appropriate security controls (that increases security while enabling business productivity)
- Focus on immediate and emerging threats that are most likely to cause business impact.
- Monitoring changes in business strategies and initiatives to stay in alignment.

Long term set direction and priorities to make steady progress over time, to improve overall security posture:
- Zero Trust: Create a clear vision, strategy, plan, and architecture for reducing risks in your organization aligned to the zero trust principles of assuming breach, least privilege, and explicit verification. Adopting these principles shifts from static controls to more dynamic risk-based decisions that are based on real-time detections of anomalous behavior irrespective of where the threat derived.
- Burndown technical debt as a consistent strategy by operating security best practices across the organization such as replacing password-based authentication with passwordless and multi-factor authentication (MFA), applying security patches, and retiring (or isolating) legacy systems. Just like paying off a mortgage, you need to make steady payments to realize the full benefit and value of your investments.
- Apply data classifications, sensitivity labels, and role-based access controls to protect data from loss or compromise throughout its lifecycle. While these can’t completely capture the dynamic nature and richness of business context and insight, they are key enablers to guide information protection and governance, limiting the potential impact of an attack.

Establishing a healthy security culture by explicitly practicing, communicating, and publicly modeling the right behavior. The culture should focus on open collaboration between business, IT, and security colleagues and applying a ‘growth mindset’ of continuous learning. Culture changes should be focused on removing siloes from security, IT, and the larger business organization to achieve greater knowledge sharing and resilience levels.

In the next blog of the series, we will explore the most common attack vectors, how and why they work so effectively, and the strategies to mitigate evolving cybersecurity threats.

To learn more about Microsoft Security solutions visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post Becoming resilient by understanding cybersecurity risks: Part 3—a security pro’s perspective appeared first on Microsoft Security Blog.

Why operational resilience will be key in 2021, and how this impacts cybersecurity

Ann Johnson — Thu, 28 Jan 2021 19:00:59 +0000

The lessons we have learned during the past 12 months have demonstrated that the ability to respond to and bounce back from adversity in general, can impact the short-and long-term success of any organization. It can even dictate the leaders and laggards in any industry.

When we take into consideration that as security threats also become more daunting, with many organizations remaining in a remote work environment, global organizations must reach a state where their core operations and services are not disrupted by unexpected changes.

The key to success in surviving any unforeseen circumstances in 2021, will be operational resiliency. Operational resilience is the ability to sustain business operations during any major event, including a cyberattack. It requires a strategic and holistic view of what could go wrong and how an organization will respond. Consider the risk and response for a utility company, for example, an organization that relies on IoT data, or a manufacturer of medical supplies. While their approach may differ, the impact would be equally as devastating should their operational continuity be halted. In today’s digital world, preparing for cyber threats must be a strategic part of that plan just like any other form of continuity and disaster recovery.

Speaking with customers globally, we know they are not fully prepared to withstand a major cyber event. Whilst many firms have a disaster recovery plan on paper, nearly a quarter have never tested that plan and only 42 percent of global executives are confident their organization could recover from a major cyber event without it affecting their business.

It begins with Zero Trust. Zero Trust is based on three principles, verify explicitly, use least privilege access, and assume breach.

Verify explicitly

Rather than trust users or devices implicitly because they’re on the corporate network or VPN’ed into it, it is critical to assume zero trust and verify each transaction explicitly. This means enabling strong authentication and authorization based on all available data points, including user identity, location, device health, service or workload, data classification, and anomalies.

This starts with strong user authentication. Multi-factor authentication (MFA) is essential, but it’s time to move away from passwords plus SMS and voice calls as authentication factors. Bad actors are getting more sophisticated all the time, and they have found a number of ways to exploit the publicly switched telephone networks (PSTN) that SMS and voice calls use as well as some social engineering methods for getting these codes from users.

For most users on their mobile devices, we believe the right answer is passwordless with app-based authentication, like Microsoft Authenticator, or a hardware key combined with biometrics.

Least privileged access

Least privileged access means that when we do grant access, we grant the minimum level of access the user needs to complete their task, and only for the amount of time they need it. Think about it this way, you can let someone into your building, but only during work hours, and you don’t let them into every lab and office.

Identity Governance allows you to balance your organization’s need for security and employee productivity with the right processes and visibility. It provides you with the capabilities to ensure that the right people have the right access to the right resources.

Assume breach

Finally, operate with the expectation of a breach, and apply techniques such as micro-segmentation and real-time analytics to detect attacks more quickly.

In a Zero Trust model, identities—whether they represent people, services, or IoT devices—define the control plane in which access decisions are made. Digital identities, such as transport layer security (TLS) and code signing certificates, SSH keys, secrets, and other cryptographic assets are critical to authentication, signing, and encryption.

That’s why having a strong identity is the critical first step to the success of a Zero Trust security approach.

Embracing Zero Trust allows organizations to harden their defenses while providing employees access to critical data, even during a cyber event. That’s because identity is the foundation of any Zero Trust security strategy because it automatically blocks attacks through adaptive security policies; across users and the accounts, devices, apps, and networks they are using. Identity is the only system that connects all security solutions together so we have end-to-end visibility to prevent, detect, and respond to distributed and sophisticated attacks thanks to cloud technology.

In a Zero Trust model, identities—whether they represent people, services, or IoT devices—define the control plane in which access decisions are made. Digital identities, such as TLS and code signing certificates, SSH keys, secrets, and other cryptographic assets are critical to authentication, signing, and encryption.

“Human identities” such as passwords, biometrics, and other MFA are critical to identifying and authenticate humans. Being a Zero Trust organization also means pervasive use of multi-factor authentication—which we know prevents 99 percent of credential theft and other intelligent authentication methods that make accessing apps easier and more secure than traditional passwords.

Identity is both the foundation for Zero Trust and acts as a catalyst for digital transformation. It automatically blocks attacks through adaptive security policies. It lets people work whenever and wherever they want, using their favorite devices and applications.

That’s because Zero Trust security relies heavily on pervasive threat signals and insights. It is essential to connect the dots and provide greater visibility to prevent, detect and respond to distributed and sophisticated attacks.

Future-proofing your security posture

As security threats become more daunting and many organizations remain in a remote work environment, global organizations must reach a state where their core operations and services will not be disrupted by unexpected global changes.

To maintain operational resilience, organizations should be regularly evaluating their risk threshold. When we talk about risk, this should include an evaluation of an organization’s ability to effectively respond to changes in the crypto landscape, such as a CA compromise, algorithm deprecation, or quantum threats on the horizon.

Bottom line: organizations must have the ability to operationally execute the processes through a combination of human efforts and technology products and services. The ability to do something as simple as restoring from recent backups will be tested in every ransomware attack, and many organizations will fail this test—not because they are not backing up their systems, but because they haven’t tested the quality of their backup procedures or practiced for a cyber event.

Operational resilience guidelines call for demonstrating that concrete measures are in place to deliver resilient services and that both incident management and contingency plans have been tested. Our new normal means that risks are no longer limited to commonly recognized sources such as cybercriminals, malware, or even targeted attacks. Operational resilience is the necessary framework we must have in place in order to maintain business continuity during any unforeseen circumstances in the year ahead.

We want to help empower every organization on the planet by continuing to share our learnings to help you reach the state where core operations and services won’t be disrupted by geopolitical or socioeconomic events, natural disasters, or even cyber events.

The post Why operational resilience will be key in 2021, and how this impacts cybersecurity appeared first on Microsoft Security Blog.

Becoming resilient by understanding cybersecurity risks: Part 2

Mark Simos and Sarah Armstrong-Smith — Thu, 17 Dec 2020 17:00:26 +0000

In part one of this blog series, we looked at how being resilient to cybersecurity threats is about understanding and managing the organizational impact from the evolution of human conflict that has existed since the dawn of humanity. In part two of this series, we further explore the imperative of thinking and acting holistically as a single organization working together to a common goal. Building true resilience begins with framing the issue accurately to the problem at hand and continuously (re)prioritizing efforts to match pace with evolving threats.

For this blog, we will use the example of a current cybersecurity threat that spans every organization in every industry as an example of how to put this into practice. The emergence of human-operated ransomware has created an organizational risk at a pace we have not seen before in cybersecurity. In these extortion attacks, attackers are studying target organizations carefully to learn what critical business processes they can stop to force organizations to pay, and what weaknesses in the IT infrastructure they can exploit to do it.

This type of threat enables attackers to stop most or all critical business operations and demand ransom to restore them by combining:

A highly lucrative extortion business model.
Organization-wide impact utilizing well-establish tools and techniques.

Whilst this may be uncomfortable reading, the ability to pre-empt and respond quickly to these cyberattacks is now an organizational imperative that requires a level of close collaboration and integration throughout your organization (which may not have happened to date).

Because these attacks directly monetize stopping your business operations, you must:

Identify and prioritize monitoring and protection for critical business assets and processes.
Restore business operations as fast as possible, when attacked.

Applying this in a complex organization requires you to:

Know thyself: The first step towards resilience is identifying your critical business assets and processes and ensuring appropriate team members truly understand them so that appropriate controls can be implemented to protect and rapidly restore them. These controls should include business and technical measures such as ensuring immutable or offline backups (as attackers try to eliminate all viable alternatives to paying the ransom, including anti-tampering mechanisms).
This is not a one-time event: Your business and technical teams need to work together to continuously evaluate your security posture relative to the changing threat landscape. This enables you to refine priorities, build mutual trust and strong relationships, and build organizational muscle memory.
Focus on high-impact users: Just as your executives and senior managers have control and access over massive amounts of sensitive and proprietary information that can damage the organization if exposed; IT administrators also have access and control over the business systems and networks that host that information. Ransomware attackers traverse your network and target IT administrator accounts, making the seizure of privileged access a critical component of their attack success. See Microsoft’s guidance on this topic
Build and sustain good hygiene: As we discussed in our first blog, maintaining and updating software and following good security practices is critical to building resilience to these attacks. Because organizations have a backlog of technical debt, it’s critical to prioritize this work to pay off the most important debt first.
Ruthlessly prioritize: Ruthless prioritization applies a calm but urgent mindset to prioritizing tasks to stay on mission. This practice focuses on the most effective actions with the fastest time to value regardless of whether those efforts fit pre-existing plans, perceptions, and habits.
Look through an attacker’s lens: The best way to prioritize your work is to put yourself in the perspective of an attacker. Establishing what information would be valuable to an attacker (or malicious insider), how they would enter your organization and access it, and how they would extract it will give you invaluable insights into how to prioritize your investments and response. Assess the gaps, weaknesses, and vulnerabilities that could be exploited by attackers across the end-to-end business processes and the backend infrastructure that supports them. By modeling the process and systems and what threats attackers can pose to them, you can take the most effective actions to remove or reduce risk to your organization.
Exercise and stress test: This strategy will be tested by attackers in the real world, so you must proactively stress test to find and fix the weaknesses before the attackers find and exploit them. This stress testing must extend to both business processes and technical systems so that organizations build overall resilience to this major risk. This requires systematically removing assumptions in favor of known facts that can be relied upon in a major incident. This should be prioritized based on scenarios that are high impact and high likelihood like human-operated ransomware.

Whilst it’s tempting for experienced leaders and technical professionals to get caught up in how things have been done before, cybersecurity is a fundamentally disruptive force that requires organizations to work collaboratively and adopt and adapt the practices documented in Microsoft’s guidance.

“We cannot solve our problems with the same thinking we used when we created them.”—Albert Einstein

For all this to be successful, your organization must work together as a single coherent entity, sharing insights and resources from business, technical, and security teams to leverage diverse viewpoints and experiences. This approach will help you plan and execute pragmatically and effectively against evolving threats that impact all parts of your organization.

In our next blog, we will continue to explore how to effectively manage risk from the perspective of business and cybersecurity leaders and the capabilities and information required to stay resilient against cyberattacks.

The post Becoming resilient by understanding cybersecurity risks: Part 2 appeared first on Microsoft Security Blog.

Operational resilience in a remote work world

Ann Johnson — Mon, 18 May 2020 16:00:55 +0000

Microsoft CEO Satya Nadella recently said, “We have seen two years’ worth of digital transformation in two months.” This is a result of many organizations having to adapt to the new world of document sharing and video conferencing as they become distributed organizations overnight.

At Microsoft, we understand that while the current health crisis we face together has served as this forcing function, some organizations might not have been ready for this new world of remote work, financially or organizationally. Just last summer, a simple lightning strike caused the U.K.’s National Grid to suffer the biggest blackout in decades. It affected homes across the country, shut down traffic signals, and closed some of the busiest train stations in the middle of the Friday evening rush hour. Trains needed to be manually rebooted causing delays and disruptions. And, when malware shut down the cranes and security gates at Maersk shipping terminals, as well as most of the company’s IT network—from the booking site to systems handling cargo manifests, it took two months to rebuild all the software systems, and three months before all cargo in transit was tracked down—with recovery dependent on a single server having been accidentally offline during the attack due to the power being cut off.

Cybersecurity provides the underpinning to operationally resiliency as more organizations adapt to enabling secure remote work options, whether in the short or long term. And, whether natural or manmade, the difference between success or struggle to any type of disruption requires a strategic combination of planning, response, and recovery. To maintain cyber resilience, one should be regularly evaluating their risk threshold and an organization’s ability to operationally execute the processes through a combination of human efforts and technology products and services.

While my advice is often a three-pronged approach of turning on multi-factor authentication (MFA)—100 percent of your employees, 100 percent of the time—using Secure Score to increase an organization’s security posture and having a mature patching program that includes containment and isolation of devices that cannot be patched, we must also understand that not every organization’s cybersecurity team may be as mature as another.

Organizations must now be able to provide their people with the right resources so they are able to securely access data, from anywhere, 100 percent of the time. Every person with corporate network access, including full-time employees, consultants, and contractors, should be regularly trained to develop a cyber-resilient mindset. They shouldn’t just adhere to a set of IT security policies around identity-based access control, but they should also be alerting IT to suspicious events and infections as soon as possible to help minimize time to remediation.

Our new normal means that risks are no longer limited to commonly recognized sources such as cybercriminals, malware, or even targeted attacks. Moving to a secure remote work environment, without a resilience plan in place that does not include cyber resilience increases an organization’s risk.

Before COVID, we knew that while a majority of firms have a disaster recovery plan on paper, nearly a quarter never test that, and only 42 percent of global executives are confident their organization could recover from a major cyber event without it affecting their business.

Operational resilience cannot be achieved without a true commitment to, and investment in, cyber resilience. We want to help empower every organization on the planet by continuing to share our learnings to help you reach the state where core operations and services won’t be disrupted by geopolitical or socioeconomic events, natural disasters, or even cyber events.

Learn more about our guidance related to COVID-19 here, and bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post Operational resilience in a remote work world appeared first on Microsoft Security Blog.

Operational resilience begins with your commitment to and investment in cyber resilience

Ann Johnson — Tue, 17 Sep 2019 16:00:37 +0000

Operational resilience cannot be achieved without a true commitment to and investment in cyber resilience. Global organizations need to reach the state where their core operations and services won’t be disrupted by geopolitical or socioeconomic events, natural disasters, and cyber events if they are to weather such events.

To help increase stability and lessen the impact to their citizens, an increasing number of government entities have drafted regulations requiring the largest organizations to achieve a true state of operational resilience: where both individual organizations and their industry absorb and adapt to shocks, rather than contributing to them. There are many phenomena that have led to this increased governance, including high-profile cyberattacks like NotPetya, WannaCrypt, and the proliferation of ransomware.

The rise in nation state and cybercrime attacks focusing on critical infrastructure and financial sectors, and the rapid growth of tech innovation pervading more and more industries, join an alarming increase in severe natural disasters, an unstable global geopolitical environment, and global financial market instability on the list of threats organizations should prepare for.

Potential impact of cybercrime attacks

Taken individually, any of these events can cripple critical business and government operations. A lightning strike this summer caused the UK’s National Grid to suffer the biggest blackout in decades. It affected homes across the country, shut down traffic signals, and closed some of the busiest train stations in the middle of the Friday evening rush hour. With trains needing to be manually rebooted, the rhythm of everyday work life was disrupted. The impact of cybercrime attacks can be as significant, and often longer term.

NotPetya cost businesses more than $10 billion; pharmaceutical giant Merck put its bill at $870 million alone. For more than a week, the malware shut down cranes and security gates at Maersk shipping terminals, as well as most of the company’s IT network—from the booking site to systems handling cargo manifests. It took two months to rebuild all the software systems, and three months before all cargo in transit was tracked down—with recovery dependent on a single server having been accidently offline during the attack due to the power being cut off.

The combination of all these threats will cause disruption to businesses and government services on a scale that hasn’t been seen before. Cyber events will also undermine the ability to respond to other types of events, so they need to be treated holistically as part of planning and response.

Extending operational resiliency to cover your cybersecurity program should not mean applying different principles to attacks, outages, and third-party failures than you would to physical attacks and natural hazards. In all cases, the emphasis is on having plans in place to deliver essential services whatever the cause of the disruption. Organizations are responding by rushing to purchase cyber-insurance policies and increasing their spending on cybersecurity. I encourage them to take a step back and have a critical understanding of what those policies actually cover, and to target the investment, so the approach supports operational resilience.

As we continue to witness an unparalleled increase in cyber-related attacks, we should take note that a large majority of the attacks have many factors in common. At Microsoft, we’ve written at length on the controls that best position an organization to defend against and respond to a cyber event.

This recent blog, by Alex Weinert, discusses the importance of Multi-Factor Authentication (MFA). Many attacks are still initiated or escalated via password theft, breach, or reuse via phishing, password spray, brute force attacks, and similar techniques that can be blocked by MFA.
In August 2017, in response to the WannaCrypt and NotPetya attacks, I wrote about tools and services relevant to establishing a cyber resilience program.
The DART team’s ongoing blog series of best practices, based on their learnings from more than 200 global customer incident responses they complete annually.
At the 2018 RSA Conference, we published a guide with E&Y and Edelman with comprehensive best practices for incident response, ranging from the technological to legal and communications.

We must not stand still

The adversary is innovating and accelerating. We must continue to be vigilant and thorough in both security posture, which must be based on “defense in depth,” and in sophistication of response.

The cost of data breaches continues to rise; the global average cost of a data breach is $3.92 million according to the 2019 Ponemon Institute report. This is up 1.5 percent from 2018 and 12 percent higher than in 2014. These continually rising costs have helped galvanize global entities around the topic of operational resilience.

The Bank of England, in July 2018, published comprehensive guidelines on operational resilience that set a robust standard for rigorous controls across all key areas: technology, legal, communications, financial solvency, business continuity, redundancy, failover, governmental, and customer impact, as well as full understanding of what systems and processes underlie your business products and services.

This paper leaves very few stones unturned and includes a clear statement of my thesis—dealing with cyber risk is an important element of operational resilience and you cannot achieve operational resilience without achieving cyber resilience.

Imagine for a moment that your entire network, including all your backups, is impacted by a cyberattack, and you cannot complete even a single customer banking transaction. That’s only one target; it’s not hard to extrapolate from here to attacks that shut down stock trades, real estate transactions, fund transfers, even to attacks on critical infrastructure like healthcare, energy, water system operators. In the event of a major attack, all these essential services will be unavailable until IT systems are restored to at least a baseline of operations.

It doesn’t require professional cybersecurity expertise to understand the impact of shutting down critical services, which is why the new paradigm for cybersecurity must begin not with regulations but with a program to build cyber resilience. The long list of public, wide-reaching cyberattacks where the companies were compliant with required regulations, but still were breached, demonstrates why we can no longer afford to use regulatory requirements as the ultimate driver of cybersecurity.

While it will always be necessary to be fully compliant with regulations like GDPR, SOX, HIPAA, MAS, regional banking regulators, and any others that might be relevant to your industry, it simply isn’t sufficient for a mature cyber program to use this compliance as the only standard. Organizations must build a program that incorporates defense in depth and implements fundamental security controls like MFA, encryption, network segmentation, patching, and isolation and reduction of exceptions. We also must consider how our operations will continue after a catastrophic cyberattack and build systems that can both withstand attack and be instantaneously resilient even during such an attack. The Bank of England uses the mnemonic WAR: for withstand, absorb, recover.

The ability to do something as simple as restoring from recent backups will be tested in every ransomware attack, and many organizations will fail this test—not because they are not backing up their systems, but because they haven’t tested the quality of their backup procedures or practiced for a cyber event. Training is not enough. Operational resilience guidelines call for demonstrating that you have concrete measures in place to deliver resilient services and that both incident management and contingency plans have been tested. You’ll need to invest in scenario planning, tabletop exercises and red/blue team exercises that prove the rigor of your threat modeling and give practice in recovering from catastrophic cyber events.

Importance of a cyber recovery plan

Imagine, if you will, how negligent it would be for your organization to never plan and prepare for a natural disaster. A cyber event is the equivalent: the same physical, legal, operational, technological, human, and communication standards must apply to preparation, response, and recovery. We should all consider it negligence if we do not have a cyber recovery plan in place. Yet, while the majority of firms have a disaster recovery plan on paper, nearly a quarter never test that and only 42 percent of global executives are confident their organization could recover from a major cyber event without it affecting their business.

Cybersecurity often focuses on defending against specific threats and vulnerabilities to mitigate cyber risk, but cyber resilience requires a more strategic and holistic view of what could go wrong and how your organization will address it as whole. The cyber events you’ll face are real threats, and preparing for them must be treated like any other form of continuity and disaster recovery. The challenges to building operational resilience have become more intense in an increasingly hostile cyber environment, and this preparation is a topic we will continue to address.

Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post Operational resilience begins with your commitment to and investment in cyber resilience appeared first on Microsoft Security Blog.

Announcing: new British Standard for cyber risk and resilience

Microsoft Security Team — Wed, 04 Apr 2018 16:00:35 +0000

Technology is an integral part of the fabric of everyday life. There is almost no organization that does not rely on digital services in some way in order to survive. The opportunity that technology provides also brings with it more vulnerabilities and threats as organizations and data become more connected and available. This trend results in a common gap found in the decision-making process at large organizations. Often information security and cybersecurity have been viewed as a function of IT and therefore, the information security departments have been managed outside of normal business decision-making processes. This is an approach we no longer have the luxury of indulging.

Organizations need a holistic approach to implement digital transformation projects to safeguard their security. This involves focusing on both the opportunity and the threat of any change. To do this effectively the accountability for cyber risk and resilience needs to sit firmly with executive management and the governing body. However, a skills gap exists at this level with many governing body members having started their careers before the internet era. Even when willing to adopt responsibility for building a cyber resilient organization, senior executives are often confused by the technical language that risk management and cybersecurity professionals speak. As well, they may also encourage cybersecurity professionals to speak directly to the board. Therefore, we also need to equip board members with the tools to ask the right questions and ensure the correct levels of risk to build cyber resilient organizations.

That is why, nearly two years ago, the BSI Risk Management Committee started working to develop new guidance aimed at helping executive leadership better understand and manage the technology risks to their organizations. I was asked to lead a group of government executives, regulators, professional bodies and technical experts with a goal of directly addressing the realities and challenges of managing cyber risk in a digital world. This goal led us to draft the new British Standard BS31111. The standard aims to provide guidance to enterprise organizations regarding cyber risk and resilience, and to address the gap in IT decision making.

The standard includes:

Parameters to build concrete guidelines into governing bodies
Identification of areas of focus an organization should have in order to build a cyber resilient enterprise
Assessment questions management can ask to challenge the organization regarding how it is building cyber resilience into the business

Cyber risk and resilience needs to be driven from the top of the organization to ensure that the right culture is set across all business decision making. Executive management must ensure that there is a clear risk and resilience strategy set across the organization, as well as ensuring that there is a strong management structure in place that details the responsibilities and expectations of everyone to maintain security. As Microsoft’s own CEO Satya Nadella has said, “Cybersecurity is like going to the gym. You can’t get better by watching others, you’ve got to get there every day”. Satya’s comments underline the reasoning behind this standard, emphasizing the need to build cyber resilience into day to day operations and not treat it as a standalone project or program.

Engaging with risk management and cyber resilience principles can be complicated and it is easy to get bogged down by technical jargon. To help, we created a visual (figure 1) intended to illustrate the areas required to develop cyber resilience and the key responsibilities of the board.

Source:BS3111:2018 Figure 1

Key tenets:

The responsibility of any Board of Directors is to clearly set the direction of business activity. They ultimately sign off on major decisions and investments and need to ensure that activity is sustainable for the business.
Executive management and the governing body are mostly responsible for the roof and foundation, with oversight on the activity of the pillars. Any building is only as good as its foundation and the same is true for building cyber resilience.

The importance of culture for security

Without a strong culture of security, it is easy for decisions to be made that expose an organization. Many of the major breaches witnessed in recent years can be traced back to a lack of ownership and leadership regarding the need for strong cybersecurity measures across the organization, along with ill-informed investment decisions. The executive management and members of the board need to clearly focus on the benefits of any digital investment AND the level of security outcomes required to support that investment. Hopefully, the new British Standard BS31111 will provide best practice aims and expectations for the responsibility and accountability of boards and executive leadership to drive change.

The publication of the standard is only the first step. It will be important to promote the need for every organization to safeguard their enterprise and their customers, more than we do today. Many boards and governing bodies are becoming more “cyber aware” and understanding their need to build cyber risk into their decision making. This publication aims to enable leadership teams and boards to build awareness and decision-making protocols across the organization.

In my short tenure with Microsoft, I have already witnessed a strong internal security culture, focused on building resilient and secure cloud platforms. I look forward to working with my customers to help them develop their own cyber resilient foundations and cultures, ensuring that Microsoft’s capabilities support them in that endeavor.

Siân serves as Executive Security Advisor for the UK at Microsoft and has worked in the Information Security industry for over 20 years. Siân is a highly requested public speaker and has regularly been on national radio and television including the BBC and Sky News talking about security issues. Siân was appointed an MBE by the Queen in the New Years Honours List for 2018 for services to Cyber Security.

The post Announcing: new British Standard for cyber risk and resilience appeared first on Microsoft Security Blog.

Reliability Series #1: Reliability vs. resilience

David Bills — Mon, 24 Mar 2014 15:52:00 +0000

Whenever I speak to customers and partners about reliability I’m reminded that while objectives and priorities differ between organizations and customers, at the end of the day, everyone wants their service to work. As a customer, you want to be able to do things online, at a time convenient to you. As an organization – or a provider of a service – you want your customers to carry out the tasks they want to, whenever they want to do so.

This article is the first in a four-part series on building a resilient service. In my first two posts, I will discuss the topic as it relates to business strategy, and then we’ll dive deeper into the technical details. The full series of four posts will cover:

1.   Reliability vs. resilience – What is the difference between reliability and resilience and why does it matter?
2.   Common reliability-related threats – DIAL (Discovery, Incorrectness, Authorization/Authentication, Limits/Latency) is a handy mnemonic to help teams brainstorm potential failures of interactions between components for their service in a structured way. Brainstorming about failure modes and failure points is a key phase in resilience modeling and analysis (RMA) and can help teams improve the reliability of their service.
3.   Reliability-enhancing techniques – Taking the “D” and “A” in DIAL, we’ll look at some reliability enhancing techniques you can incorporate into your design related to discovery and authentication.
4.   Reliability-enhancing techniques – Taking the “I” and “L” in DIAL, we’ll look at some reliability enhancing techniques you can incorporate into your design related to incorrectness and limits.

My intention is to provide insight into how Microsoft thinks about reliability and the processes and techniques we’re employing to improve the reliability of our services for our customers.

So what is reliability? When I ask customers and partners, the most common responses refer to consistency in performance, speed, availability – and perhaps most significantly –resilience. One thing we all agree on is that for a system or service to be reliable, the user has to believe ‘it just works’.

The Institute of Electrical and Electronics Engineers (IEEE) Reliability Society states reliability [engineering] is “a design engineering discipline which applies scientific knowledge to assure that a system will perform its intended function for the required duration within a given environment, including the ability to test and support the system through its total lifecycle.” For software, it defines reliability as “the probability of failure-free software operation for a specified period of time in a specified environment.”

A reliable cloud service is essentially one that functions as the designer intended it to, when it is expected to, and wherever the customer is connected. That’s not to say every component must operate flawlessly 100 percent of the time. This last point brings us to what I believe is the difference between reliability and resiliency.

Reliability is the outcome cloud service providers strive for – it’s the result. Resiliency is the ability of a cloud-based service to withstand certain types of failure and yet remain functional from the customer perspective. In other words, reliability is the outcome and resilience is the way you achieve the outcome. A service could be characterized as reliable simply because no part of the service has ever failed, and yet the service couldn’t be regarded as resilient because those reliability-enhancing capabilities may never have been tested.

The key takeaway here is the importance of focusing on resilience and designing and building resiliency into your service at every stage of the software development lifecycle. To find out more about the fundamentals of building a reliable online service, read our whitepaper ‘An introduction to designing reliable cloud services’.

**Next up: Reliability Series #2: Categorizing reliability threats to your service

The post Reliability Series #1: Reliability vs. resilience appeared first on Microsoft Security Blog.

How do I know if I already have antivirus software?

Eve Blakemore — Fri, 21 Feb 2014 17:14:00 +0000

UPDATE: For the most relevant information on antivirus software, learn how to protect your computer by visiting the Microsoft support page on Windows Defender. We can ensure this page stays updated with the most relevant information on how to help protect your PC with Windows Defender.

If your computer is running Windows 8

If your computer is running Windows 8, you already have antivirus software. Windows 8 includes Windows Defender, which helps protect you from viruses, spyware, and other malicious software.

If Windows Defender is turned off and you don’t have another antivirus program installed (or your other antivirus program is not working), you will see a warning in the notification area on your taskbar.

If your computer is running Windows 7

Windows 7 includes spyware protection, but to protect against viruses you can download Microsoft Security Essentials for free.

To find out if you already have antivirus software:

Open Action Center by clicking the Start button , clicking Control Panel, and then, under System and Security, clicking Review your computer’s status.
Click the arrow button next to Security to expand the section.

If Windows can detect your antivirus software, it’s listed under Virus protection.

Windows doesn’t detect all antivirus software, and some antivirus software doesn’t report its status to Windows. If your antivirus software isn’t displayed in Action Center and you’re not sure how to find it, try any of the following:

Type the name of the software or the publisher in the Search box on the Start menu.
Look for your antivirus program’s icon in the notification area of the taskbar.

If your computer is running Windows Vista

Windows Vista does not include virus protection. To protect against viruses, you can download Microsoft Security Essentials for free.

The status of your antivirus software is typically displayed in Windows Security Center.

Open Security Center by clicking the Start button , clicking Control Panel, clicking Security, and then clicking Security Center.
Click Malware protection.

If Windows can detect your antivirus software, it will be listed under Virus protection.

Windows does not detect all antivirus software, and some antivirus software doesn’t report its status to Windows. If your antivirus software is not displayed in Windows Security Center and you’re not sure how to find it, try any of the following:

Look for the antivirus software in the list of programs on the Start menu.
Type the name of the software or the publisher in the Search box on the Start menu.
Look for the icon in the notification area of the taskbar.

If your computer is running Windows XP

Click the security icon on the taskbar, or click Start, select Control Panel, and then double-click Security Center.

On April 8, 2014, Microsoft will end support for Windows XP. This means that after April 8, there will be no new security updates available through automatic updating for computers that are still running Windows XP.

Also on this date, Microsoft will stop providing Microsoft Security Essentials for download on Windows XP. (If you already have Microsoft Security Essentials installed, you will continue to receive antimalware signature updates for a limited time, but this does not mean that your PC will be secure because Microsoft will no longer provide security updates to protect it.)

For more information, see Support is ending soon.

I don’t know what operating system my computer is running

Find out what operating system your computer is running

The post How do I know if I already have antivirus software? appeared first on Microsoft Security Blog.