Ram Shankar Siva Kumar, Author at Microsoft Security Blog

3 takeaways from red teaming 100 generative AI products

Blake Bullwinkel and Ram Shankar Siva Kumar — Mon, 13 Jan 2025 16:00:00 +0000

Microsoft’s AI red team is excited to share our whitepaper, “Lessons from Red Teaming 100 Generative AI Products.”

The AI red team was formed in 2018 to address the growing landscape of AI safety and security risks. Since then, we have expanded the scope and scale of our work significantly. We are one of the first red teams in the industry to cover both security and responsible AI, and red teaming has become a key part of Microsoft’s approach to generative AI product development. Red teaming is the first step in identifying potential harms and is followed by important initiatives at the company to measure, manage, and govern AI risk for our customers. Last year, we also announced PyRIT (The Python Risk Identification Tool for generative AI), an open-source toolkit to help researchers identify vulnerabilities in their own AI systems.

Pie chart showing the percentage breakdown of products tested by the Microsoft AI red team. As of October 2024, we had red teamed more than 100 generative AI products.

With a focus on our expanded mission, we have now red-teamed more than 100 generative AI products. The whitepaper we are now releasing provides more detail about our approach to AI red teaming and includes the following highlights:

Our AI red team ontology, which we use to model the main components of a cyberattack including adversarial or benign actors, TTPs (Tactics, Techniques, and Procedures), system weaknesses, and downstream impacts. This ontology provides a cohesive way to interpret and disseminate a wide range of safety and security findings.
Eight main lessons learned from our experience red teaming more than 100 generative AI products. These lessons are geared towards security professionals looking to identify risks in their own AI systems, and they shed light on how to align red teaming efforts with potential harms in the real world.
Five case studies from our operations, which highlight the wide range of vulnerabilities that we look for including traditional security, responsible AI, and psychosocial harms. Each case study demonstrates how our ontology is used to capture the main components of an attack or system vulnerability.

Lessons from Red Teaming 100 Generative AI Products

Discover more about our approach to AI red teaming.

Read the whitepaper

Microsoft AI red team tackles a multitude of scenarios

Over the years, the AI red team has tackled a wide assortment of scenarios that other organizations have likely encountered as well. We focus on vulnerabilities most likely to cause harm in the real world, and our whitepaper shares case studies from our operations that highlight how we have done this in four scenarios including security, responsible AI, dangerous capabilities (such as a model’s ability to generate hazardous content), and psychosocial harms. As a result, we are able to recognize a variety of potential cyberthreats and adapt quickly when confronting new ones.

This mission has given our red team a breadth of experiences to skillfully tackle risks regardless of:

System type, including Microsoft Copilot, models embedded in systems, and open-source models.
Modality, whether text-to-text, text-to-image, or text-to-video.
User type—enterprise user risk, for example, is different from consumer risks and requires a unique red teaming approach. Niche audiences, such as for a specific industry like healthcare, also deserve a nuanced approach.

Top three takeaways from the whitepaper

AI red teaming is a practice for probing the safety and security of generative AI systems. Put simply, we “break” the technology so that others can build it back stronger. Years of red teaming have given us invaluable insight into the most effective strategies. In reflecting on the eight lessons discussed in the whitepaper, we can distill three top takeaways that business leaders should know.

Takeaway 1: Generative AI systems amplify existing security risks and introduce new ones

The integration of generative AI models into modern applications has introduced novel cyberattack vectors. However, many discussions around AI security overlook existing vulnerabilities. AI red teams should pay attention to cyberattack vectors both old and new.

Existing security risks: Application security risks often stem from improper security engineering practices including outdated dependencies, improper error handling, credentials in source, lack of input and output sanitization, and insecure packet encryption. One of the case studies in our whitepaper describes how an outdated FFmpeg component in a video processing AI application introduced a well-known security vulnerability called server-side request forgery (SSRF), which could allow an adversary to escalate their system privileges.

Illustration of the SSRF vulnerability in the video-processing generative AI application.

Model-level weaknesses: AI models have expanded the cyberattack surface by introducing new vulnerabilities. Prompt injections, for example, exploit the fact that AI models often struggle to distinguish between system-level instructions and user data. Our whitepaper includes a red teaming case study about how we used prompt injections to trick a vision language model.

Red team tip: AI red teams should be attuned to new cyberattack vectors while remaining vigilant for existing security risks. AI security best practices should include basic cyber hygiene.

Takeaway 2: Humans are at the center of improving and securing AI

While automation tools are useful for creating prompts, orchestrating cyberattacks, and scoring responses, red teaming can’t be automated entirely. AI red teaming relies heavily on human expertise.

Humans are important for several reasons, including:

Subject matter expertise: LLMs are capable of evaluating whether an AI model response contains hate speech or explicit sexual content, but they’re not as reliable at assessing content in specialized areas like medicine, cybersecurity, and CBRN (chemical, biological, radiological, and nuclear). These areas require subject matter experts who can evaluate content risk for AI red teams.
Cultural competence: Modern language models use primarily English training data, performance benchmarks, and safety evaluations. However, as AI models are deployed around the world, it is crucial to design red teaming probes that not only account for linguistic differences but also redefine harms in different political and cultural contexts. These methods can be developed only through the collaborative effort of people with diverse cultural backgrounds and expertise.
Emotional intelligence: In some cases, emotional intelligence is required to evaluate the outputs of AI models. One of the case studies in our whitepaper discusses how we are probing for psychosocial harms by investigating how chatbots respond to users in distress. Ultimately, only humans can fully assess the range of interactions that users might have with AI systems in the wild.

Red team tip: Adopt tools like PyRIT to scale up operations but keep humans in the red teaming loop for the greatest success at identifying impactful AI safety and security vulnerabilities.

Takeaway 3: Defense in depth is key for keeping AI systems safe

Numerous mitigations have been developed to address the safety and security risks posed by AI systems. However, it is important to remember that mitigations do not eliminate risk entirely. Ultimately, AI red teaming is a continuous process that should adapt to the rapidly evolving risk landscape and aim to raise the cost of successfully attacking a system as much as possible.

Novel harm categories: As AI systems become more sophisticated, they often introduce entirely new harm categories. For example, one of our case studies explains how we probed a state-of-the-art LLM for risky persuasive capabilities. AI red teams must constantly update their practices to anticipate and probe for these novel risks.
Economics of cybersecurity: Every system is vulnerable because humans are fallible, and adversaries are persistent. However, you can deter adversaries by raising the cost of attacking a system beyond the value that would be gained. One way to raise the cost of cyberattacks is by using break-fix cycles.¹ This involves undertaking multiple rounds of red teaming, measurement, and mitigation—sometimes referred to as “purple teaming”—to strengthen the system to handle a variety of attacks.
Government action: Industry action to defend against cyberattackers and
failures is one side of the AI safety and security coin. The other side is
government action in a way that could deter and discourage these broader
failures. Both public and private sectors need to demonstrate commitment and vigilance, ensuring that cyberattackers no longer hold the upper hand and society at large can benefit from AI systems that are inherently safe and secure.

Red team tip: Continually update your practices to account for novel harms, use break-fix cycles to make AI systems as safe and secure as possible, and invest in robust measurement and mitigation techniques.

Advance your AI red teaming expertise

The “Lessons From Red Teaming 100 Generative AI Products” whitepaper includes our AI red team ontology, additional lessons learned, and five case studies from our operations. We hope you will find the paper and the ontology useful in organizing your own AI red teaming exercises and developing further case studies by taking advantage of PyRIT, our open-source automation framework.

Together, the cybersecurity community can refine its approaches and share best practices to effectively address the challenges ahead. Download our red teaming whitepaper to read more about what we’ve learned. As we progress along our own continuous learning journey, we would welcome your feedback and hearing about your own AI red teaming experiences.

Learn more with Microsoft Security

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.

¹ Phi-3 Safety Post-Training: Aligning Language Models with a “Break-Fix” Cycle

The post 3 takeaways from red teaming 100 generative AI products appeared first on Microsoft Security Blog.

Announcing Microsoft’s open automation framework to red team generative AI Systems

Ram Shankar Siva Kumar — Thu, 22 Feb 2024 17:00:00 +0000

Today we are releasing an open automation framework, PyRIT (Python Risk Identification Toolkit for generative AI), to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

At Microsoft, we believe that security practices and generative AI responsibilities need to be a collaborative effort. We are deeply committed to developing tools and resources that enable every organization across the globe to innovate responsibly with the latest artificial intelligence advances. This tool, and the previous investments we have made in red teaming AI since 2019, represents our ongoing commitment to democratize securing AI for our customers, partners, and peers.

The need for automation in AI Red Teaming

Red teaming AI systems is a complex, multistep process. Microsoft’s AI Red Team leverages a dedicated interdisciplinary group of security, adversarial machine learning, and responsible AI experts. The Red Team also leverages resources from the entire Microsoft ecosystem, including the Fairness center in Microsoft Research; AETHER, Microsoft’s cross-company initiative on AI Ethics and Effects in Engineering and Research; and the Office of Responsible AI. Our red teaming is part of our larger strategy to map AI risks, measure the identified risks, and then build scoped mitigations to minimize them.

Over the past year, we have proactively red teamed several high-value generative AI systems and models before they were released to customers. Through this journey, we found that red teaming generative AI systems is markedly different from red teaming classical AI systems or traditional software in three prominent ways.

1. Probing both security and responsible AI risks simultaneously

We first learned that while red teaming traditional software or classical AI systems mainly focuses on identifying security failures, red teaming generative AI systems includes identifying both security risk as well as responsible AI risks. Responsible AI risks, like security risks, can vary widely, ranging from generating content that includes fairness issues to producing ungrounded or inaccurate content. AI red teaming needs to explore the potential risk space of security and responsible AI failures simultaneously.

2. Generative AI is more probabilistic than traditional red teaming

Secondly, we found that red teaming generative AI systems is more probabilistic than traditional red teaming. Put differently, executing the same attack path multiple times on traditional software systems would likely yield similar results. However, generative AI systems have multiple layers of non-determinism; in other words, the same input can provide different outputs. This could be because of the app-specific logic; the generative AI model itself; the orchestrator that controls the output of the system can engage different extensibility or plugins; and even the input (which tends to be language), with small variations can provide different outputs. Unlike traditional software systems with well-defined APIs and parameters that can be examined using tools during red teaming, we learned that generative AI systems require a strategy that considers the probabilistic nature of their underlying elements.

3. Generative AI systems architecture varies widely

Finally, the architecture of these generative AI systems varies widely: from standalone applications to integrations in existing applications to the input and output modalities, such as text, audio, images, and videos.

These three differences make a triple threat for manual red team probing. To surface just one type of risk (say, generating violent content) in one modality of the application (say, a chat interface on browser), red teams need to try different strategies multiple times to gather evidence of potential failures. Doing this manually for all types of harms, across all modalities across different strategies, can be exceedingly tedious and slow.

This does not mean automation is always the solution. Manual probing, though time-consuming, is often needed for identifying potential blind spots. Automation is needed for scaling but is not a replacement for manual probing. We use automation in two ways to help the AI red team: automating our routine tasks and identifying potentially risky areas that require more attention.

In 2021, Microsoft developed and released a red team automation framework for classical machine learning systems. Although Counterfit still delivers value for traditional machine learning systems, we found that for generative AI applications, Counterfit did not meet our needs, as the underlying principles and the threat surface had changed. Because of this, we re-imagined how to help security professionals to red team AI systems in the generative AI paradigm and our new toolkit was born.

We like to acknowledge out that there have been work in the academic space to automate red teaming such as PAIR and open source projects including garak.

PyRIT for generative AI Red teaming

PyRIT is battle-tested by the Microsoft AI Red Team. It started off as a set of one-off scripts as we began red teaming generative AI systems in 2022. As we red teamed different varieties of generative AI systems and probed for different risks, we added features that we found useful. Today, PyRIT is a reliable tool in the Microsoft AI Red Team’s arsenal.

The biggest advantage we have found so far using PyRIT is our efficiency gain. For instance, in one of our red teaming exercises on a Copilot system, we were able to pick a harm category, generate several thousand malicious prompts, and use PyRIT’s scoring engine to evaluate the output from the Copilot system all in the matter of hours instead of weeks.

PyRIT is not a replacement for manual red teaming of generative AI systems. Instead, it augments an AI red teamer’s existing domain expertise and automates the tedious tasks for them. PyRIT shines light on the hot spots of where the risk could be, which the security professional than can incisively explore. The security professional is always in control of the strategy and execution of the AI red team operation, and PyRIT provides the automation code to take the initial dataset of harmful prompts provided by the security professional, then uses the LLM endpoint to generate more harmful prompts.

However, PyRIT is more than a prompt generation tool; it changes its tactics based on the response from the generative AI system and generates the next input to the generative AI system. This automation continues until the security professional’s intended goal is achieved.

PyRIT components

Abstraction and Extensibility is built into PyRIT. That’s because we always want to be able to extend and adapt PyRIT’s capabilities to new capabilities that generative AI models engender. We achieve this by five interfaces: target, datasets, scoring engine, the ability to support multiple attack strategies and providing the system with memory.

Targets: PyRIT supports a variety of generative AI target formulations—be it as a web service or embedded in application. PyRIT out of the box supports text-based input and can be extended for other modalities as well. PyRIT supports integrating with models from Microsoft Azure OpenAI Service, Hugging Face, and Azure Machine Learning Managed Online Endpoint, effectively acting as an adaptable bot for AI red team exercises on designated targets, supporting both single and multi-turn interactions.
Datasets: This is where the security professional encodes what they want the system to be probed for. It could either be a static set of malicious prompts or a dynamic prompt template. Prompt templates allow the security professionals to automatically encode multiple harm categories—security and responsible AI failures—and leverage automation to pursue harm exploration in all categories simultaneously. To get users started, our initial release includes prompts that contain well-known, publicly available jailbreaks from popular sources.
Extensible scoring engine: The scoring engine behind PyRIT offers two options for scoring the outputs from the target AI system: using a classical machine learning classifier or using an LLM endpoint and leveraging it for self-evaluation. Users can also use Azure AI Content filters as an API directly.
Extensible attack strategy: PyRIT supports two styles of attack strategy. The first is single-turn; in other words, PyRIT sends a combination of jailbreak and harmful prompts to the AI system and scores the response. It also supports multiturn strategy, in which the system sends a combination of jailbreak and harmful prompts to the AI system, scores the response, and then responds to the AI system based on the score. While single-turn attack strategies are faster in computation time, multiturn red teaming allows for more realistic adversarial behavior and more advanced attack strategies.
Memory: PyRIT’s tool enables the saving of intermediate input and output interactions providing users with the capability for in-depth analysis later on. The memory feature facilitates the ability to share the conversations explored by the PyRIT agent and increases the range explored by the agents to facilitate longer turn conversations.

Get started with PyRIT

PyRIT was created in response to our belief that the sharing of AI red teaming resources across the industry raises all boats. We encourage our peers across the industry to spend time with the toolkit and see how it can be adopted for red teaming your own generative AI application.

Get started with the PyRIT project here. To get acquainted with the toolkit, our initial release has a list of demos including common scenarios notebooks, including how to use PyRIT to automatically jailbreak using Lakera’s popular Gandalf game.
We are hosting a webinar on PyRIT to demonstrate how to use it in red teaming generative AI systems. If you would like to see PyRIT in action, please register for our webinar in partnership with the Cloud Security Alliance.
Learn more about what Microsoft’s AI Red Team is doing and explore more resources on how you can better prepare your organization for securing AI.
Watch Microsoft Secure online to explore more product innovations to help you take advantage of AI safely, responsibly, and securely.

Contributors

Project created by Gary Lopez; Engineering: Richard Lundeen, Roman Lutz, Raja Sekhar Rao Dheekonda, Dr. Amanda Minnich; Broader involvement from Shiven Chawla, Pete Bryan, Peter Greko, Tori Westerhoff, Martin Pouliot, Bolor-Erdene Jagdagdorj, Chang Kawaguchi, Charlotte Siska, Nina Chikanov, Steph Ballard, Andrew Berkley, Forough Poursabzi, Xavier Fernandes, Dean Carignan, Kyle Jackson, Federico Zarfati, Jiayuan Huang, Chad Atalla, Dan Vann, Emily Sheng, Blake Bullwinkel, Christiano Bianchet, Keegan Hines, eric douglas, Yonatan Zunger, Christian Seifert, Ram Shankar Siva Kumar. Grateful for comments from Jonathan Spring.

Learn more

Explore data security resources and trends

Gain insights into the latest data security advancements, including expert guidance, best practices, trends, and solutions.

Get started

The post Announcing Microsoft’s open automation framework to red team generative AI Systems appeared first on Microsoft Security Blog.

Microsoft AI Red Team building future of safer AI

Ram Shankar Siva Kumar — Mon, 07 Aug 2023 15:00:00 +0000

An essential part of shipping software securely is red teaming. It broadly refers to the practice of emulating real-world adversaries and their tools, tactics, and procedures to identify risks, uncover blind spots, validate assumptions, and improve the overall security posture of systems. Microsoft has a rich history of red teaming emerging technology with a goal of proactively identifying failures in the technology. As AI systems became more prevalent, in 2018, Microsoft established the AI Red Team: a group of interdisciplinary experts dedicated to thinking like attackers and probing AI systems for failures.

We’re sharing best practices from our team so others can benefit from Microsoft’s learnings. These best practices can help security teams proactively hunt for failures in AI systems, define a defense-in-depth approach, and create a plan to evolve and grow your security posture as generative AI systems evolve.

The practice of AI red teaming has evolved to take on a more expanded meaning: it not only covers probing for security vulnerabilities, but also includes probing for other system failures, such as the generation of potentially harmful content. AI systems come with new risks, and red teaming is core to understanding those novel risks, such as prompt injection and producing ungrounded content. AI red teaming is not just a nice to have at Microsoft; it is a cornerstone to responsible AI by design: as Microsoft President and Vice Chair, Brad Smith, announced, Microsoft recently committed that all high-risk AI systems will go through independent red teaming before deployment.

The goal of this blog is to contextualize for security professionals how AI red teaming intersects with traditional red teaming, and where it differs. This, we hope, will empower more organizations to red team their own AI systems as well as provide insights into leveraging their existing traditional red teams and AI teams better.

Red teaming helps make AI implementation safer

Over the last several years, Microsoft’s AI Red Team has continuously created and shared content to empower security professionals to think comprehensively and proactively about how to implement AI securely. In October 2020, Microsoft collaborated with MITRE as well as industry and academic partners to develop and release the Adversarial Machine Learning Threat Matrix, a framework for empowering security analysts to detect, respond, and remediate threats. Also in 2020, we created and open sourced Microsoft Counterfit, an automation tool for security testing AI systems to help the whole industry improve the security of AI solutions. Following that, we released the AI security risk assessment framework in 2021 to help organizations mature their security practices around the security of AI systems, in addition to updating Counterfit. Earlier this year, we announced additional collaborations with key partners to help organizations understand the risks associated with AI systems so that organizations can use them safely, including the integration of Counterfit into MITRE tooling, and collaborations with Hugging Face on an AI-specific security scanner that is available on GitHub.

Security-related AI red teaming is part of a larger responsible AI (RAI) red teaming effort that focuses on Microsoft’s AI principles of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. The collective work has had a direct impact on the way we ship AI products to our customers. For instance, before the new Bing chat experience was released, a team of dozens of security and responsible AI experts across the company spent hundreds of hours probing for novel security and responsible AI risks. This was in addition to the regular, intensive software security practices followed by the team, as well as red teaming the base GPT-4 model by RAI experts in advance of developing Bing Chat. Our red teaming findings informed the systematic measurement of these risks and built scoped mitigations before the product shipped.

Guidance and resources for red teaming

AI red teaming generally takes place at two levels: at the base model level (e.g., GPT-4) or at the application level (e.g., Security Copilot, which uses GPT-4 in the back end). Both levels bring their own advantages: for instance, red teaming the model helps to identify early in the process how models can be misused, to scope capabilities of the model, and to understand the model’s limitations. These insights can be fed into the model development process to improve future model versions but also get a jump-start on which applications it is most suited for. Application-level AI red teaming takes a system view, of which the base model is one part. For instance, when AI red teaming Bing Chat, the entire search experience powered by GPT-4 was in scope and was probed for failures. This helps to identify failures beyond just the model-level safety mechanisms, by including the overall application specific safety triggers.

Together, probing for both security and responsible AI risks provides a single snapshot of how threats and even benign usage of the system can compromise the integrity, confidentiality, availability, and accountability of AI systems. This combined view of security and responsible AI provides valuable insights not just in proactively identifying issues, but also to understand their prevalence in the system through measurement and inform strategies for mitigation. Below are key learnings that have helped shape Microsoft’s AI Red Team program.

AI red teaming is more expansive. AI red teaming is now an umbrella term for probing both security and RAI outcomes. AI red teaming intersects with traditional red teaming goals in that the security component focuses on model as a vector. So, some of the goals may include, for instance, to steal the underlying model. But AI systems also inherit new security vulnerabilities, such as prompt injection and poisoning, which need special attention. In addition to the security goals, AI red teaming also includes probing for outcomes such as fairness issues (e.g., stereotyping) and harmful content (e.g., glorification of violence). AI red teaming helps identify these issues early so we can prioritize our defense investments appropriately.
AI red teaming focuses on failures from both malicious and benign personas. Take the case of red teaming new Bing. In the new Bing, AI red teaming not only focused on how a malicious adversary can subvert the AI system via security-focused techniques and exploits, but also on how the system can generate problematic and harmful content when regular users interact with the system. So, unlike traditional security red teaming, which mostly focuses on only malicious adversaries, AI red teaming considers broader set of personas and failures.
AI systems are constantly evolving. AI applications routinely change. For instance, in the case of a large language model application, developers may change the metaprompt (underlying instructions to the ML model) based on feedback. While traditional software systems also change, in our experience, AI systems change at a faster rate. Thus, it is important to pursue multiple rounds of red teaming of AI systems and to establish systematic, automated measurement and monitor systems over time.
Red teaming generative AI systems requires multiple attempts. In a traditional red teaming engagement, using a tool or technique at two different time points on the same input, would always produce the same output. In other words, generally, traditional red teaming is deterministic. Generative AI systems, on the other hand, are probabilistic. This means that running the same input twice may provide different outputs. This is by design because the probabilistic nature of generative AI allows for a wider range in creative output. This also makes it tricky to red teaming since a prompt may not lead to failure in the first attempt, but be successful (in surfacing security threats or RAI harms) in the succeeding attempt. One way we have accounted for this is, as Brad Smith mentioned in his blog, to pursue multiple rounds of red teaming in the same operation. Microsoft has also invested in automation that helps to scale our operations and a systemic measurement strategy that quantifies the extent of the risk.
Mitigating AI failures requires defense in depth. Just like in traditional security where a problem like phishing requires a variety of technical mitigations such as hardening the host to smartly identifying malicious URIs, fixing failures found via AI red teaming requires a defense-in-depth approach, too. This involves the use of classifiers to flag potentially harmful content to using metaprompt to guide behavior to limiting conversational drift in conversational scenarios.

Building technology responsibly and securely is in Microsoft’s DNA. Last year, Microsoft celebrated the 20-year anniversary of the Trustworthy Computing memo that asked Microsoft to deliver products “as available, reliable and secure as standard services such as electricity, water services, and telephony.” AI is shaping up to be the most transformational technology of the 21st century. And like any new technology, AI is subject to novel threats. Earning customer trust by safeguarding our products remains a guiding principle as we enter this new era – and the AI Red Team is front and center of this effort. We hope this blog post inspires others to responsibly and safely integrate AI via red teaming.

Resources

AI red teaming is part of the broader Microsoft strategy to deliver AI systems securely and responsibly. Here are some other resources to provide insights into this process:

For customers who are building applications using Azure OpenAI models, we released a guide to help them assemble an AI red team, define scope and goals, and execute on the deliverables.
For security incident responders, we released a bug bar to systematically triage attacks on ML systems.
For ML engineers, we released a checklist to complete AI risk assessment.
For developers, we released threat modeling guidance specifically for ML systems.
For anyone interested in learning more about responsible AI, we’ve released a version of our Responsible AI Standard and Impact Assessment, among other resources.
For engineers and policymakers, Microsoft, in collaboration with Berkman Klein Center at Harvard University, released a taxonomy documenting various machine learning failure modes.
For the broader security community, Microsoft hosted the annual Machine Learning Evasion Competition.
For Azure Machine Learning customers, we provided guidance on enterprise security and governance.

Contributions from Steph Ballard, Forough Poursabzi, Amanda Minnich, Gary Lopez Munoz, and Chang Kawaguchi.

The post Microsoft AI Red Team building future of safer AI appeared first on Microsoft Security Blog.

Best practices for AI security risk management

Ram Shankar Siva Kumar — Thu, 09 Dec 2021 21:00:43 +0000

Today, we are releasing an AI security risk assessment framework as a step to empower organizations to reliably audit, track, and improve the security of the AI systems. In addition, we are providing new updates to Counterfit, our open-source tool to simplify assessing the security posture of AI systems.

There is a marked interest in securing AI systems from adversaries. Counterfit has been heavily downloaded and explored by organizations of all sizes—from startups to governments and large-scale organizations—to proactively secure their AI systems. From a different vantage point, the Machine Learning Evasion Competition we organized to help security professionals exercise their muscles to defend and attack AI systems in a realistic setting saw record participation, doubling the amount of participants and techniques than the previous year.

This interest demonstrates the growth mindset and opportunity in securing AI systems. But how do we harness interest into action that can raise the security posture of AI systems? When the rubber hits the road, how can a security engineer think about mitigating the risk of an AI system being compromised?

AI security risk assessment framework

The deficit is clear: according to Gartner® Market Guide for AI Trust, Risk and Security Management published in September 2021, “AI poses new trust, risk and security management requirements that conventional controls do not address.”¹ To address this gap, we did not want to invent a new process. We acknowledge that security professionals are already overwhelmed. Moreover, we believe that even though the attacks on AI systems pose a new security risk, current software security practices are relevant and can be adapted to manage this novel risk. To that end, we fashioned our AI security risk assessment in the spirit of the current security risk assessment frameworks.

We believe that to comprehensively assess the security risk for an AI system, we need to look at the entire lifecycle of system development and deployment. An overreliance on securing machine learning models through academic adversarial machine learning oversimplifies the problem in practice. This means, to truly secure the AI model, we need to account for securing the entire supply chain and management of AI systems.

Through our own operations experience in building and red teaming models at Microsoft, we recognize that securing AI systems is a team sport. AI researchers design model architectures. Machine learning engineers build data ingestion, model training, and deployment pipelines. Security architects establish appropriate security policies. Security analysts respond to threats. To that end, we envisioned a framework that would involve participation from each of these stakeholders.

“Designing and developing secure AI is a cornerstone of AI product development at Boston Consulting Group (BCG). As the societal need to secure our AI systems becomes increasingly apparent, assets like Microsoft’s AI security risk management framework can be foundational contributions. We already implement best practices found in this framework in the AI systems we develop for our clients and are excited that Microsoft has developed and open sourced this framework for the benefit of the entire industry.”—Jack Molloy, Senior Security Engineer, BCG

As a result of our Microsoft-wide collaboration, our framework features the following characteristics:

Provides a comprehensive perspective to AI system security. We looked at each element of the AI system lifecycle in a production setting: from data collection, data processing, to model deployment. We also accounted for AI supply chains, as well as the controls and policies with respect to backup, recovery, and contingency planning related to AI systems.
Outlines machine learning threats and recommendations to abate them. To directly help engineers and security professionals, we enumerated the threat statement at each step of the AI system building process. Next, we provided a set of best practices that overlay and reinforce existing software security practices in the context of securing AI systems.
Enables organizations to conduct risk assessments. The framework provides the ability to gather information about the current state of security of AI systems in an organization, perform gap analysis, and track the progress of the security posture.

Updates to Counterfit

To help security professionals get a broader view of the security posture of the AI systems, we have also significantly expanded Counterfit. The first release of Counterfit wrapped two popular frameworks—Adversarial Robustness Toolbox (ART) and TextAttack—to provide evasion attacks against models operating on tabular, image, and textual inputs. With the new release, Counterfit now features the following:

An extensible architecture that simplifies integration of new attack frameworks.
Attacks that include both access to the internals of the machine learning model and with just query access to the machine learning model.
Threat paradigms that include evasion, model inversion, model inference, and model extraction.
In addition to algorithmic attacks provided, common corruption attacks through AugLy are also included.
Attacks are supported for models that accept tabular data, images, text, HTML, or Windows executable files as input.

Learn More

These efforts are part of broader investment at Microsoft to empower engineers to securely develop and deploy AI systems. We recommend using it alongside the following resources:

For security analysts to orient to threats against AI systems, Microsoft, in collaboration with MITRE, released an ATT&CK style Adversarial Threat Matrix complete with case studies of attacks on production machine learning systems, which has evolved into MITRE ATLAS.
For security incident responders, we released our own bug bar to systematically triage attacks on machine learning systems.
For developers, we released threat modeling guidance specifically for machine learning systems.
For engineers and policymakers, Microsoft, in collaboration with Berkman Klein Center at Harvard University, released a taxonomy documenting various machine learning failure modes.
For security professionals, Microsoft open sourced Counterfit to help with assessing the posture of AI systems.
For the broader security community, Microsoft hosted the annual Machine Learning Evasion Competition.
For Azure machine learning customers, we provided guidance on enterprise security and governance.

This is a living framework. If you have questions or feedback, please contact us.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

¹ Gartner, Market Guide for AI Trust, Risk and Security Management, Avivah Litan, et al., 1 September 2021 GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

The post Best practices for AI security risk management appeared first on Microsoft Security Blog.

AI security risk assessment using Counterfit

Ram Shankar Siva Kumar — Mon, 03 May 2021 16:00:52 +0000

Today, we are releasing Counterfit, an automation tool for security testing AI systems as an open-source project. Counterfit helps organizations conduct AI security risk assessments to ensure that the algorithms used in their businesses are robust, reliable, and trustworthy.

AI systems are increasingly used in critical areas such as healthcare, finance, and defense. Consumers must have confidence that the AI systems powering these important domains are secure from adversarial manipulation. For instance, one of the recommendations from Gartner’s Top 5 Priorities for Managing AI Risk Within Gartner’s MOST Framework published in Jan 2021¹ is that organizations “Adopt specific AI security measures against adversarial attacks to ensure resistance and resilience,” noting that “By 2024, organizations that implement dedicated AI risk management controls will successfully avoid negative AI outcomes twice as often as those that do not.”

However, performing security assessments of production AI systems is nontrivial. Microsoft surveyed 28 organizations, spanning Fortune 500 companies, governments, non-profits, and small and medium sized businesses (SMBs), to understand the current processes in place to secure AI systems. We found that 25 out of 28 businesses indicated they don’t have the right tools in place to secure their AI systems and that security professionals are looking for specific guidance in this space.

This tool was born out of our own need to assess Microsoft’s AI systems for vulnerabilities with the goal of proactively securing AI services, in accordance with Microsoft’s responsible AI principles and Responsible AI Strategy in Engineering (RAISE) initiative. Counterfit started as a corpus of attack scripts written specifically to target individual AI models, and then morphed into a generic automation tool to attack multiple AI systems at scale.

Today, we routinely use Counterfit as part of our AI red team operations. We have found it helpful to automate techniques in MITRE’s Adversarial ML Threat Matrix and replay them against Microsoft’s own production AI services to proactively scan for AI-specific vulnerabilities. Counterfit is also being piloted in the AI development phase to catch vulnerabilities in AI systems before they hit production.

To ensure that Counterfit addresses a broader set of security professionals’ needs, we engaged with a diverse profile of partners spanning large organizations, SMBs, and governmental organizations to test the tool against their ML models in their environments.

“AI is increasingly used in industry; it is vital to look ahead to securing this technology particularly to understand where feature space attacks can be realized in the problem space. The release of open-source tools from an organization such as Microsoft for security practitioners to evaluate the security of AI systems is both welcome and a clear indication that the industry is taking this problem seriously.”

—Matilda Rhode, Senior Cybersecurity Researcher, Airbus

Three key ways Counterfit is flexible

As a result of internal and external engagements, Counterfit is flexible in three key ways:

Counterfit is environment agnostic—it can help assess AI models hosted in any cloud environment, on-premises, or on the edge.
Counterfit is model agnostic—the tool abstracts the internal workings of their AI models so that security professionals can focus on security assessment.
Counterfit strives to be data agnostic—it works on AI models using text, images, or generic input.

Under the hood, Counterfit is a command-line tool that provides a generic automation layer for adversarial AI frameworks such as Adversarial Robustness Toolbox and TextAttack. Our tool makes published attack algorithms accessible to the security community and helps to provide an extensible interface from which to build, manage, and launch attacks on AI models.

Designed for security professionals

Counterfit uses workflows and terminology similar to popular offensive tools that security professionals are already familiar with, such as Metasploit or PowerShell Empyre. Security professionals can benefit from the tool in the following ways:

Penetration testing and red teaming AI systems: The tool comes preloaded with published attack algorithms that can be used to bootstrap red team operations to evade and steal AI models. Since attacking AI systems also involves elements of traditional exploitation, security professionals can use the target interface and built-in cmd2 scripting engine to hook into Counterfit from existing offensive tools. Additionally, the target interface can allow for granular control over network traffic. We recommend using Counterfit alongside Adversarial ML Threat Matrix, which is an ATT&CK style framework released by MITRE and Microsoft for security analysts to orient to threats against AI systems.

Vulnerability scanning for AI systems: The tool can help scan AI models using published attack algorithms. Security professionals can use the defaults, set random parameters, or customize them for broad vulnerability coverage of an AI model. Organizations with multiple models in their AI system can use Counterfit’s built-in automation to scan at scale. Optionally, Counterfit enables organizations to scan AI systems with relevant attacks any number of times to create baselines. Running this system regularly, as vulnerabilities are addressed, also helps to measure ongoing progress toward securing AI systems.
Logging for AI systems: Counterfit also provides logging to record the attacks against a target model. Telemetry may help data science and engineering teams improve their understanding of failure modes in their AI systems.

This tool is part of broader efforts at Microsoft to empower engineers to securely develop and deploy AI systems. We recommend using it alongside the following resources:

For security analysts to orient to threats against AI systems, Microsoft, in collaboration with MITRE, released an ATT&CK style Adversarial ML Threat Matrix complete with case studies of attacks on production ML systems.
For security incident responders, we released our own bug bar to systematically triage attacks on ML systems.
For industry practitioners and security professionals to develop muscle in defending and attacking ML systems, we hosted a realistic Machine Learning Evasion Competition.
For developers, we released threat modeling guidance specifically for ML systems.
For engineers and policymakers, Microsoft, in collaboration with Berkman Klein Center at Harvard University, released a taxonomy documenting various ML failure modes.

Learn more

To learn more about this effort:

Visit the Counterfit Git Hub Repository. If you have specific needs around securing AI systems, please email us at counterfithelpline@microsoft.com.
For a walk-through of Counterfit and live tutorial, please register for our May 10, 2021, webinar.

To learn more about Microsoft Security solutions visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

¹Gartner, Top 5 Priorities for Managing AI Risk Within Gartner’s MOST Framework, Avivah Litan, et al., 15 January 2021.

The post AI security risk assessment using Counterfit appeared first on Microsoft Security Blog.

Cyberattacks against machine learning systems are more common than you think

Ram Shankar Siva Kumar and Ann Johnson — Thu, 22 Oct 2020 16:00:19 +0000

Machine learning (ML) is making incredible transformations in critical areas such as finance, healthcare, and defense, impacting nearly every aspect of our lives. Many businesses, eager to capitalize on advancements in ML, have not scrutinized the security of their ML systems. Today, along with MITRE, and contributions from 11 organizations including IBM, NVIDIA, Bosch, Microsoft is releasing the Adversarial ML Threat Matrix, an industry-focused open framework, to empower security analysts to detect, respond to, and remediate threats against ML systems.

During the last four years, Microsoft has seen a notable increase in attacks on commercial ML systems. Market reports are also bringing attention to this problem: Gartner’s Top 10 Strategic Technology Trends for 2020, published in October 2019, predicts that “Through 2022, 30% of all AI cyberattacks will leverage training-data poisoning, AI model theft, or adversarial samples to attack AI-powered systems.” Despite these compelling reasons to secure ML systems, Microsoft’s survey spanning 28 businesses found that most industry practitioners have yet to come to terms with adversarial machine learning. Twenty-five out of the 28 businesses indicated that they don’t have the right tools in place to secure their ML systems. What’s more, they are explicitly looking for guidance. We found that preparation is not just limited to smaller organizations. We spoke to Fortune 500 companies, governments, non-profits, and small and mid-sized organizations.

Our survey pointed to marked cognitive dissonance especially among security analysts who generally believe that risk to ML systems is a futuristic concern. This is a problem because cyber attacks on ML systems are now on the uptick. For instance, in 2020 we saw the first CVE for an ML component in a commercial system and SEI/CERT issued the first vuln note bringing to attention how many of the current ML systems can be subjected to arbitrary misclassification attacks assaulting the confidentiality, integrity, and availability of ML systems. The academic community has been sounding the alarm since 2004, and have routinely shown that ML systems, if not mindfully secured, can be compromised.

Introducing the Adversarial ML Threat Matrix

Microsoft worked with MITRE to create the Adversarial ML Threat Matrix, because we believe the first step in empowering security teams to defend against attacks on ML systems, is to have a framework that systematically organizes the techniques employed by malicious adversaries in subverting ML systems. We hope that the security community can use the tabulated tactics and techniques to bolster their monitoring strategies around their organization’s mission critical ML systems.

Primary audience is security analysts: We think that securing ML systems is an infosec problem. The goal of the Adversarial ML Threat Matrix is to position attacks on ML systems in a framework that security analysts can orient themselves in these new and upcoming threats. The matrix is structured like the ATT&CK framework, owing to its wide adoption among the security analyst community – this way, security analysts do not have to learn a new or different framework to learn about threats to ML systems. The Adversarial ML Threat Matrix is also markedly different because the attacks on ML systems are inherently different from traditional attacks on corporate networks.
Grounded in real attacks on ML Systems: We are seeding this framework with a curated set of vulnerabilities and adversary behaviors that Microsoft and MITRE have vetted to be effective against production ML systems. This way, security analysts can focus on realistic threats to ML systems. We also incorporated learnings from Microsoft’s vast experience in this space into the framework: for instance, we found that model stealing is not the end goal of the attacker but in fact leads to more insidious model evasion. We also found that when attacking an ML system, attackers use a combination of “traditional techniques” like phishing and lateral movement alongside adversarial ML techniques.

Open to the community

We recognize that adversarial ML is a significant area of research in academia, so we also garnered input from researchers at the University of Toronto, Cardiff University, and the Software Engineering Institute at Carnegie Mellon University. The Adversarial ML Threat Matrix is a first attempt at collecting known adversary techniques against ML Systems and we invite feedback and contributions. As the threat landscape evolves, this framework will be modified with input from the security and machine learning community.

“When it comes to Machine Learning security, the barriers between public and private endeavors and responsibilities are blurring; public sector challenges like national security will require the cooperation of private actors as much as public investments. So, in order to help address these challenges, we at MITRE are committed to working with organizations like Microsoft and the broader community to identify critical vulnerabilities across the machine learning supply chain.

This framework is a first step in helping to bring communities together to enable organizations to think about the emerging challenges in securing machine learning systems more holistically.”

– Mikel Rodriguez, Director of Machine Learning Research, MITRE

This initiative is part of Microsoft’s commitment to develop and deploy ML systems securely. The AI, Ethics, and Effects in Engineering and Research (Aether) Committee provides guidance to engineers to develop safe, secure, and reliable ML systems and uphold customer trust. To comprehensively protect and monitor ML systems against active attacks, the Azure Trustworthy Machine Learning team routinely assesses the security posture of critical ML systems and works with product teams and front-line defenders from the Microsoft Security Response Center (MSRC) team. The lessons from these activities are routinely shared with the community for various people:

For engineers and policymakers, in collaboration with Berkman Klein Center at Harvard University, we released a taxonomy documenting various ML failure modes.
For developers, we released threat modeling guidance specifically for ML systems.
For security incident responders, we released our own bug bar to systematically triage attacks on ML systems
For academic researchers, Microsoft opened a $300K Security AI RFP, and as a result, partnering with multiple universities to push the boundary in this space.
For industry practitioners and security professionals to develop muscle in defending and attacking ML systems, Microsoft hosted a realistic machine learning evasion competition.

This effort is aimed at security analysts and the broader security community: the matrix and the case studies are meant to help in strategizing protection and detection; the framework seeds attacks on ML systems, so that they can carefully carry out similar exercises in their organizations and validate the monitoring strategies.

To learn more about this effort, visit the Adversarial ML Threat Matrix GitHub repository and read about the topic from MITRE’s announcement, and SEI/CERT blog.

Listen to the Security Unlocked podcast

Hear more from the author of this blog on episode #9 of Security Unlocked. Subscribe for new episodes each week covering the latest in security news.

Listen now

The post Cyberattacks against machine learning systems are more common than you think appeared first on Microsoft Security Blog.

Azure Sentinel uncovers the real threats hidden in billions of low fidelity signals

Ram Shankar Siva Kumar — Thu, 20 Feb 2020 14:00:43 +0000

Cybercrime is as much a people problem as it is a technology problem. To respond effectively, the defender community must harness machine learning to compliment the strengths of people. This is the philosophy that undergirds Azure Sentinel. Azure Sentinel is a cloud-native SIEM that exploits machine learning techniques to empower security analysts, data scientists, and engineers to focus on the threats that matter. You may have heard of similar solutions from other vendors, but the Fusion technology that powers Azure Sentinel sets this SIEM apart for three reasons:

Fusion finds threats that fly under the radar, by combining low fidelity, “yellow” anomalous activities into high fidelity “red” incidents.
Fusion does this by using machine learning to combine disparate data—network, identity, SaaS, endpoint—from both Microsoft and Partner data sources.
Fusion incorporates graph-based machine learning and a probabilistic kill chain to reduce alert fatigue by 90 percent.

Azure Sentinel

Intelligent security analytics for your entire enterprise.

Learn more

You can get a sense of how powerful Fusion is by looking at data from December 2019. During that month, billions of events flowed into Azure Sentinel from thousands of Azure Sentinel customers. Nearly 50 billion anomalous alerts were identified and graphed. After Fusion applied the probabilistic kill chain, the graph was reduced to 110 sub graphs. A second level of machine learning reduced it further to just 25 actionable incidents. This is how Azure Sentinel reduces alert fatigue by 90 percent.

New Fusion scenarios—Microsoft Defender ATP + Palo Alto firewalls

There are currently 35 multi-stage attack scenarios generally available through Fusion machine learning technology in Azure Sentinel. Today, Microsoft has introduced several additional scenarios—in public preview—using Microsoft Defender Advanced Threat Protection (ATP) and Palo Alto logs. This way, you can leverage the power of Sentinel and Microsoft Threat Protection as complementary technologies for the best customer protection.

Detect otherwise missed attacks—By stitching together disparate datasets using Bayesian methods, Fusion helps to detect attacks that could have been missed.
Reduce mean time to remediate—Microsoft Threat Protection provides a best in class investigation experience when addressing alerts from Microsoft products. For non-Microsoft datasets, you can leverage hunting and investigation tools in Azure Sentinel.

Here are a few examples:

An endpoint connects to TOR network followed by suspicious activity on the Internal network—Microsoft Defender ATP detects that a user inside the network made a request to a TOR anonymization service. On its own this incident would be a low-level fidelity. It’s suspicious but doesn’t rise to the level of a high-level threat. Palo Alto firewalls registers anomalous activity from the same IP address, but it isn’t risky enough to block. Separately neither of these alerts get elevated, but together they indicate a multi-stage attack. Fusion makes the connection and promotes it to a high-fidelity incident.

A PowerShell program on an endpoint connects to a suspicious IP address, followed by suspicious activity on the Internal network—Microsoft Defender ATP generates an alert when a PowerShell program makes a suspicious network connection. If Palo Alto allows traffic from that IP address back into the network, Fusion ties the two incidents together to create a high-fidelity incident

An endpoint connects to a suspicious IP followed by anomalous activity on the Internal network—If Microsoft Defender ATP detects an outbound connection to an IP with a history of unauthorized access and Palo Alto firewalls allows an inbound request from that same IP address, it’s elevated by Fusion.

How Fusion works

Construct graph

The process starts by collecting data from several data sources, such as Microsoft products, Microsoft security partner products, and other cloud providers. Each of those security products output anomalous activity, which together can number in the billions or trillions. Fusion gathers all the low and medium level alerts detected in a 30-day window and creates a graph. The graph is hyperconnected and consists of billions of vertices and edges. Each entity is represented by a vertex (or node). For example, a vertex could be a user, an IP address, a virtual machine (VM), or any other entity within the network. The edges (or links) represent all the activities. If a user accesses company resources with a mobile device, both the device and the user are represented as vertices connected by an edge.

Once the graph is built there are still billions of alerts—far too many for any security operations team to make sense of. However, within those connected alerts there may be a pattern that indicates something more serious. The human brain is just not equipped to quickly remove it. This is where machine learning can make a real difference.

Apply probabilistic kill chain

Fusion applies a probabilistic kill chain which acts as a regularizer to the graph. The statistical analysis is based on how real people—Microsoft security experts, vendors, and customers—triage alerts. For example, defenders prioritize kill chains that are time bound. If a kill chain is executed within a day, it will take precedence over one that is enacted over a few days. An even higher priority kill chain is one in which all steps have been completed. This intelligence is encoded into the Fusion machine learning statistical model. Once the probabilistic kill chain is applied, Fusion outputs a smaller number of sub graphs, reducing the number of threats from billions to hundreds.

Score the attack

To reduce the noise further, Fusion uses machine learning to apply a final round of scoring. If labeled data exists, Fusion uses random forests. Labeled data for attacks is generated from the extensive Azure red team that execute these scenarios. If labeled data doesn’t exist Fusion uses spectral clustering.

Some of the criteria used to elevate threats include the number of high impact activity in the graph and whether the subgraph connects to another subgraph.

The output of this machine learning process is tens of threats. These are extremely high priority alerts that require immediate action. Without Fusion, these alerts would likely remain hidden from view, since they can only be seen after two or more low level threats are stitched together to shine a light on stealth activities. AI-generated alerts can now be handed off to people who will determine how to respond.

The great promise of AI in cybersecurity is its ability to enable your cybersecurity people to stay one step ahead of the humans on the other side. AI-backed Fusion is just one example of the innovative potential of partnering technology and people to take on the threats of today and tomorrow.

Learn more

Read more about Azure Sentinel and dig into all the Azure Sentinel detection scenarios.

Also, bookmark the Security blog to keep up with our expert coverage on security matters. Follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post Azure Sentinel uncovers the real threats hidden in billions of low fidelity signals appeared first on Microsoft Security Blog.