{"id":134543,"date":"2024-06-04T10:00:00","date_gmt":"2024-06-04T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/?p=134543"},"modified":"2024-06-25T16:16:37","modified_gmt":"2024-06-25T23:16:37","slug":"ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/","title":{"rendered":"AI jailbreaks: What they are and how they can be mitigated"},"content":{"rendered":"\n<p>Generative AI systems are made up of multiple components that interact to provide a rich user experience between the human and the AI model(s). As part of a <a href=\"https:\/\/www.microsoft.com\/ai\/principles-and-approach\/\">responsible AI approach<\/a>, AI models are protected by layers of defense mechanisms to prevent the production of harmful content or being used to carry out instructions that go against the intended purpose of the AI integrated application. This blog will provide an understanding of what AI jailbreaks are, why generative AI is susceptible to them, and how you can mitigate the risks and harms.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-ai-jailbreak\">What is AI jailbreak?<\/h2>\n\n\n\n<p>An AI jailbreak is a <em>technique<\/em> that can cause the failure of guardrails (<em>mitigations<\/em>). The resulting <em>harm<\/em> comes from whatever guardrail was circumvented: for example, causing the system to violate its operators\u2019 policies, make decisions unduly influenced by one user, or execute malicious instructions. This <em>technique<\/em> may be associated with additional attack <em>techniques<\/em> such as prompt injection, evasion, and model manipulation. You can learn more about AI jailbreak techniques in our AI red team\u2019s Microsoft Build session, <a href=\"https:\/\/youtu.be\/zFRn_RMSPI4?si=PoAW7Y-VSgiC4R5p\">How Microsoft Approaches AI Red Teaming<\/a>.<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig1r-1024x478.webp\" alt=\"Diagram of AI safety ontology, which shows relationship of system, harm, technique, and mitigation.\" class=\"wp-image-134569 webp-format\" srcset=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig1r-1024x478.webp 1024w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig1r-300x140.webp 300w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig1r-768x358.webp 768w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig1r.webp 1125w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig1r-1024x478.webp\"><figcaption class=\"wp-element-caption\"><em>Figure 1. AI safety finding ontology&nbsp;<\/em><\/figcaption><\/figure>\n\n\n\n<p>Here is an example of an attempt to ask an AI assistant to provide information about how to build a Molotov cocktail (firebomb). We know this knowledge is built into most of the generative AI models available today, but is prevented from being provided to the user through filters and other techniques to deny this request. Using a technique like <a href=\"https:\/\/aka.ms\/crescendoblog\">Crescendo<\/a>, however, the AI assistant can produce the harmful content that should otherwise have been avoided. This particular problem has since been addressed in Microsoft&#8217;s safety filters; however, AI models are still susceptible to it. Many variations of these attempts are discovered on a regular basis, then tested and mitigated.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"720\" src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig2.gif\" alt=\"Animated image showing the use of a Crescendo attack to ask ChatGPT to produce harmful content.\" class=\"wp-image-134546\"\/><figcaption class=\"wp-element-caption\"><em>Figure 2. Crescendo attack to build a Molotov cocktail&nbsp;<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"why-is-generative-ai-susceptible-to-this-issue\">Why is generative AI susceptible to this issue?<strong><\/strong><\/h2>\n\n\n\n<p>When integrating AI into your applications, consider the characteristics of AI and how they might impact the results and decisions made by this technology. Without anthropomorphizing AI, the interactions are very similar to the issues you might find when dealing with people. You can consider the attributes of an AI language model to be similar to an eager but inexperienced employee trying to help your other employees with their productivity:<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Over-confident: They may confidently present ideas or solutions that sound impressive but are not grounded in reality, like an overenthusiastic rookie who hasn&#8217;t learned to distinguish between fiction and fact.<\/li>\n\n\n\n<li>Gullible: They can be easily influenced by how tasks are assigned or how questions are asked, much like a na\u00efve employee who takes instructions too literally or is swayed by the suggestions of others.<\/li>\n\n\n\n<li>Wants to impress: While they generally follow company policies, they can be persuaded to bend the rules or bypass safeguards when pressured or manipulated, like an employee who may cut corners when tempted.<\/li>\n\n\n\n<li>Lack of real-world application: Despite their extensive knowledge, they may struggle to apply it effectively in real-world situations, like a new hire who has studied the theory but may lack practical experience and common sense.<\/li>\n<\/ol>\n\n\n\n<p>In essence, AI language models can be likened to employees who are enthusiastic and knowledgeable but lack the judgment, context understanding, and adherence to boundaries that come with experience and maturity in a business setting.<\/p>\n\n\n\n<p>So we can say that generative AI models and system have the following characteristics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Imaginative but sometimes unreliable<\/li>\n\n\n\n<li>Suggestible and literal-minded, without appropriate guidance<\/li>\n\n\n\n<li>Persuadable and potentially exploitable<\/li>\n\n\n\n<li>Knowledgeable yet impractical for some scenarios<\/li>\n<\/ul>\n\n\n\n<p>Without the proper protections in place, these systems can not only produce harmful content, but could also carry out unwanted actions and leak sensitive information.<\/p>\n\n\n\n<p>Due to the nature of working with human language, generative capabilities, and the data used in training the models, AI models are non-deterministic, i.e., the same input will not always produce the same outputs. These results can be improved in the training phases, as we saw with the results of increased resilience in Phi-3 based on direct feedback from our AI Red Team. As all generative AI systems are subject to these issues, Microsoft recommends taking a zero-trust approach towards the implementation of AI; assume that any generative AI model could be susceptible to jailbreaking and limit the potential damage that can be done if it is achieved. This requires a layered approach to mitigate, detect, and respond to jailbreaks. <a href=\"https:\/\/www.youtube.com\/watch?v=zFRn_RMSPI4\">Learn more about our AI Red Team approach<\/a>.<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig3r-1024x364.webp\" alt=\"Diagram of anatomy of an AI application, showing relationship with AI application, AI model, Prompt, and AI user.\" class=\"wp-image-134570 webp-format\" srcset=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig3r-1024x364.webp 1024w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig3r-300x107.webp 300w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig3r-768x273.webp 768w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig3r.webp 1350w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig3r-1024x364.webp\"><figcaption class=\"wp-element-caption\"><em>Figure 3. Anatomy of an AI application<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-the-scope-of-the-problem\">What is the scope of the problem?<\/h2>\n\n\n\n<p>When an AI jailbreak occurs, the severity of the impact is determined by the guardrail that it circumvented. Your response to the issue will depend on the specific situation and if the jailbreak can lead to unauthorized access to content or trigger automated actions. For example, if the harmful content is generated and presented back to a single user, this is an isolated incident that, while harmful, is limited. However, if the jailbreak could result in the system carrying out automated actions, or producing content that could be visible to more than the individual user, then this becomes a more severe incident. As a technique, jailbreaks should not have an incident severity of their own; rather, severities should depend on the consequence of the overall event (you can read about Microsoft\u2019s approach in the <a href=\"https:\/\/www.microsoft.com\/en-us\/msrc\/bounty-ai\">AI bug bounty<\/a> program).<\/p>\n\n\n\n<p>Here are some examples of the types of risks that could occur from an AI jailbreak:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AI safety and security risks:<ul><li>Unauthorized data access<\/li><\/ul><ul><li>Sensitive data exfiltration<\/li><\/ul><ul><li>Model evasion<\/li><\/ul><ul><li>Generating ransomware<\/li><\/ul>\n<ul class=\"wp-block-list\">\n<li>Circumventing individual policies or compliance systems<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Responsible AI risks:<ul><li>Producing content that violates policies (e.g., harmful, offensive, or violent content)<\/li><\/ul><ul><li>Access to dangerous capabilities of the model (e.g., producing actionable instructions for dangerous or criminal activity)<\/li><\/ul><ul><li>Subversion of decision-making systems (e.g., making a loan application or hiring system produce attacker-controlled decisions)<\/li><\/ul><ul><li>Causing the system to misbehave in a newsworthy and screenshot-able way<\/li><\/ul>\n<ul class=\"wp-block-list\">\n<li>IP infringement<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-do-ai-jailbreaks-occur\">How do AI jailbreaks occur?<\/h2>\n\n\n\n<p>The two basic families of jailbreak depend on who is doing them:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A \u201cclassic\u201d jailbreak happens when an authorized operator of the system crafts jailbreak inputs in order to extend their own powers over the system.<\/li>\n\n\n\n<li>Indirect prompt injection happens when a system processes data controlled by a third party (e.g., analyzing incoming emails or documents editable by someone other than the operator) who inserts a malicious payload into that data, which then leads to a jailbreak of the system.<\/li>\n<\/ul>\n\n\n\n<p>You can learn more about both of these types of jailbreaks <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-services\/content-safety\/concepts\/jailbreak-detection\">here<\/a>.<\/p>\n\n\n\n<p>There is a wide range of known jailbreak-like attacks. Some of them (like <a href=\"https:\/\/github.com\/alexisvalentino\/Chatgpt-DAN\">DAN<\/a>) work by adding instructions to a single user input, while others (like <a href=\"https:\/\/crescendo-the-multiturn-jailbreak.github.io\/\">Crescendo<\/a>) act over several turns, gradually shifting the conversation to a particular end. Jailbreaks may use very \u201chuman\u201d techniques such as social psychology, effectively sweet-talking the system into bypassing safeguards, or very \u201cartificial\u201d techniques that <a href=\"https:\/\/arxiv.org\/abs\/2307.15043\">inject strings<\/a> with no obvious human meaning, but which nonetheless could confuse AI systems. Jailbreaks should not, therefore, be regarded as a single technique, but as a group of methodologies in which a guardrail can be talked around by an appropriately crafted input.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mitigation-and-protection-guidance\">Mitigation and protection guidance<\/h2>\n\n\n\n<p>To mitigate the potential of AI jailbreaks, Microsoft takes defense in depth approach when protecting our AI systems, from models hosted on Azure AI to each Copilot solution we offer. When building your own AI solutions within Azure, the following are some of the key enabling technologies that you can use to implement jailbreak mitigations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prompt filtering: <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-services\/content-safety\/concepts\/jailbreak-detection\">Prompt Shields in Azure AI Content Safety<\/a><\/li>\n\n\n\n<li>Identity management: <a href=\"https:\/\/learn.microsoft.com\/entra\/identity\/managed-identities-azure-resources\/overview\">Managed identities for Azure resources<\/a><\/li>\n\n\n\n<li>Data access controls: <a href=\"https:\/\/learn.microsoft.com\/purview\/ai-microsoft-purview\">Microsoft Purview data security for generative AI apps<\/a> &nbsp;&nbsp;<\/li>\n\n\n\n<li>System metaprompt: <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/concepts\/system-message\">System message framework and template recommendations for large language models (LLMs)<\/a><\/li>\n\n\n\n<li>Content filtering: <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/concepts\/content-filter?tabs=warning%2Cpython-new\">Azure OpenAI Service content filtering<\/a><\/li>\n\n\n\n<li>Abuse monitoring: <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/concepts\/abuse-monitoring\">Azure OpenAI Service abuse monitoring<\/a><\/li>\n\n\n\n<li>Model alignment during training: <a href=\"https:\/\/learn.microsoft.com\/training\/paths\/introduction-generative-ai\/\">Microsoft Azure AI Fundamentals: Generative AI &#8211; Training<\/a><\/li>\n\n\n\n<li>Threat protection: <a href=\"https:\/\/learn.microsoft.com\/azure\/defender-for-cloud\/ai-threat-protection\">Microsoft Defender for Cloud threat protection for AI workloads<\/a><\/li>\n<\/ul>\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig4r-1024x319.webp\" alt=\"Diagram of layered approach to protecting AI applications, with filters for prompts, identity management and data access controls for the AP application, and content filtering and abuse monitoring for the AI model. \" class=\"wp-image-134571 webp-format\" srcset=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig4r-1024x319.webp 1024w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig4r-300x93.webp 300w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig4r-768x239.webp 768w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig4r.webp 1350w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig4r-1024x319.webp\"><figcaption class=\"wp-element-caption\"><em>Figure 4. Layered approach to protecting AI applications.<\/em><\/figcaption><\/figure>\n\n\n\n<p>With layered defenses, there are increased chances to mitigate, detect, and appropriately respond to any potential jailbreaks.<\/p>\n\n\n\n<p>To empower security professionals and machine learning engineers to proactively find risks in their own generative AI systems, Microsoft has released an open automation framework, Python Risk Identification Toolkit for generative AI (PyRIT). Read more about the release of&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/02\/22\/announcing-microsofts-open-automation-framework-to-red-team-generative-ai-systems\/\">PyRIT for generative AI Red teaming<\/a>, and&nbsp;<a href=\"https:\/\/github.com\/Azure\/PyRIT\">access the PyRIT toolkit on GitHub<\/a>.<\/p>\n\n\n\n<p>When building solutions on Azure AI, use the Azure AI Studio capabilities to build benchmarks, create metrics, and implement continuous monitoring and <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-studio\/concepts\/evaluation-approach-gen-ai#evaluating-jailbreak-vulnerability\">evaluation for potential jailbreak issues<\/a>.<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig5n-1024x479.webp\" alt=\"Diagram showing Azure AI Studio capabilities \" class=\"wp-image-134613 webp-format\" srcset=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig5n-1024x479.webp 1024w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig5n-300x140.webp 300w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig5n-768x359.webp 768w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig5n-1536x718.webp 1536w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig5n.webp 1860w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-fig5n-1024x479.webp\"><figcaption class=\"wp-element-caption\"><em>Figure 5. Azure AI Studio capabilities&nbsp;<\/em><\/figcaption><\/figure>\n\n\n\n<p>If you discover new vulnerabilities in any AI platform, we encourage you to follow responsible disclosure practices for the platform owner. Microsoft\u2019s procedure is explained here:&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/msrc\/bounty-ai\">Microsoft AI Bounty Program<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"detection-guidance\">Detection guidance<\/h2>\n\n\n\n<p>Microsoft builds multiple layers of detections into each of our AI hosting and Copilot solutions.<\/p>\n\n\n\n<p>To detect attempts of jailbreak in your own AI systems, you should ensure you have enabled logging and are monitoring interactions in each component, especially the conversation transcripts, system metaprompt, and the prompt completions generated by the AI model.<\/p>\n\n\n\n<p>Microsoft recommends setting the <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-services\/content-safety\/overview\">Azure AI Content Safety<\/a> filter severity threshold to the most restrictive options, suitable for your application. You can also use Azure AI Studio to begin the evaluation of your AI application safety with the following guidance: <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-studio\/concepts\/evaluation-approach-gen-ai#evaluating-jailbreak-vulnerability\">Evaluation of generative AI applications with Azure AI Studio<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"summary\">Summary<\/h2>\n\n\n\n<p>This article provides the foundational guidance and understanding of AI jailbreaks. In future blogs, we will explain the specifics of any newly discovered jailbreak techniques. Each one will articulate the following key points:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>We will describe the jailbreak technique discovered and how it works, with evidential testing results.<\/li>\n\n\n\n<li>We will have followed responsible disclosure practices to provide insights to the affected AI providers, ensuring they have suitable time to implement mitigations.<\/li>\n\n\n\n<li>We will explain how Microsoft\u2019s own AI systems have been updated to implement mitigations to the jailbreak.<\/li>\n\n\n\n<li>We will provide detection and mitigation information to assist others to implement their own further defenses in their AI systems.<\/li>\n<\/ol>\n\n\n\n<p><\/p>\n\n\n\n<p><em><strong>Richard Diver<br><\/strong>Microsoft Security<\/em><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"learn-more\">Learn more<\/h2>\n\n\n\n<p>For the latest security research from the Microsoft Threat Intelligence community, check out the Microsoft Threat Intelligence Blog:&nbsp;<a href=\"https:\/\/aka.ms\/threatintelblog\">https:\/\/aka.ms\/threatintelblog<\/a>.<\/p>\n\n\n\n<p>To get notified about new publications and to join discussions on social media, follow us on LinkedIn at <a href=\"https:\/\/www.linkedin.com\/showcase\/microsoft-threat-intelligence\">https:\/\/www.linkedin.com\/showcase\/microsoft-threat-intelligence<\/a>, and on X (formerly Twitter) at&nbsp;<a href=\"https:\/\/twitter.com\/MsftSecIntel\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/twitter.com\/MsftSecIntel<\/a>.<\/p>\n\n\n\n<p>To hear stories and insights from the Microsoft Threat Intelligence community about the ever-evolving threat landscape, listen to the Microsoft Threat Intelligence podcast: <a href=\"https:\/\/thecyberwire.com\/podcasts\/microsoft-threat-intelligence\">https:\/\/thecyberwire.com\/podcasts\/microsoft-threat-intelligence<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft security researchers, in partnership with other security experts, continue to proactively explore and discover new types of AI model and system vulnerabilities. In this post we are providing information about AI jailbreaks, a family of vulnerabilities that can occur when the defenses implemented to protect AI from producing harmful content fails. This article will be a useful reference for future announcements of new jailbreak techniques.<\/p>\n","protected":false},"author":68,"featured_media":134554,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ms_queue_id":[],"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","footnotes":""},"content-type":[3663],"topic":[3664,3687],"products":[],"threat-intelligence":[4237,3727],"tags":[],"coauthors":[3380],"class_list":["post-134543","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","content-type-research","topic-ai-and-machine-learning","topic-threat-intelligence","threat-intelligence-ai-threats","threat-intelligence-attacker-techniques-tools-and-infrastructure","review-flag-1694638265-576","review-flag-1-1694638265-354","review-flag-2-1694638266-864","review-flag-3-1694638266-241","review-flag-4-1694638266-512","review-flag-5-1694638266-171","review-flag-alway-1694638263-571","review-flag-machi-1694638272-641","review-flag-new-1694638263-340"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>AI jailbreaks: What they are and how they can be mitigated | Microsoft Security Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AI jailbreaks: What they are and how they can be mitigated | Microsoft Security Blog\" \/>\n<meta property=\"og:description\" content=\"Microsoft security researchers, in partnership with other security experts, continue to proactively explore and discover new types of AI model and system vulnerabilities. In this post we are providing information about AI jailbreaks, a family of vulnerabilities that can occur when the defenses implemented to protect AI from producing harmful content fails. This article will be a useful reference for future announcements of new jailbreak techniques.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Security Blog\" \/>\n<meta property=\"article:published_time\" content=\"2024-06-04T17:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-25T23:16:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"800\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Microsoft Threat Intelligence\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.png\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Microsoft Threat Intelligence\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/\"},\"author\":[{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/author\/microsoft-security-threat-intelligence\/\",\"@type\":\"Person\",\"@name\":\"Microsoft Threat Intelligence\"}],\"headline\":\"AI jailbreaks: What they are and how they can be mitigated\",\"datePublished\":\"2024-06-04T17:00:00+00:00\",\"dateModified\":\"2024-06-25T23:16:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/\"},\"wordCount\":1793,\"publisher\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.webp\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/\",\"name\":\"AI jailbreaks: What they are and how they can be mitigated | Microsoft Security Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.webp\",\"datePublished\":\"2024-06-04T17:00:00+00:00\",\"dateModified\":\"2024-06-25T23:16:37+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#primaryimage\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.webp\",\"contentUrl\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.webp\",\"width\":1200,\"height\":800,\"caption\":\"Photo of developer evaluating data from intelligent apps built in Azure\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI jailbreaks: What they are and how they can be mitigated\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#website\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/\",\"name\":\"Microsoft Security Blog\",\"description\":\"Expert coverage of cybersecurity topics\",\"publisher\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#organization\",\"name\":\"Microsoft Security Blog\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2018\/08\/cropped-cropped-microsoft_logo_element.png\",\"contentUrl\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2018\/08\/cropped-cropped-microsoft_logo_element.png\",\"width\":512,\"height\":512,\"caption\":\"Microsoft Security Blog\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"AI jailbreaks: What they are and how they can be mitigated | Microsoft Security Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/","og_locale":"en_US","og_type":"article","og_title":"AI jailbreaks: What they are and how they can be mitigated | Microsoft Security Blog","og_description":"Microsoft security researchers, in partnership with other security experts, continue to proactively explore and discover new types of AI model and system vulnerabilities. In this post we are providing information about AI jailbreaks, a family of vulnerabilities that can occur when the defenses implemented to protect AI from producing harmful content fails. This article will be a useful reference for future announcements of new jailbreak techniques.","og_url":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/","og_site_name":"Microsoft Security Blog","article_published_time":"2024-06-04T17:00:00+00:00","article_modified_time":"2024-06-25T23:16:37+00:00","og_image":[{"width":1200,"height":800,"url":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.png","type":"image\/png"}],"author":"Microsoft Threat Intelligence","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.png","twitter_misc":{"Written by":"Microsoft Threat Intelligence","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#article","isPartOf":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/"},"author":[{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/author\/microsoft-security-threat-intelligence\/","@type":"Person","@name":"Microsoft Threat Intelligence"}],"headline":"AI jailbreaks: What they are and how they can be mitigated","datePublished":"2024-06-04T17:00:00+00:00","dateModified":"2024-06-25T23:16:37+00:00","mainEntityOfPage":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/"},"wordCount":1793,"publisher":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#organization"},"image":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#primaryimage"},"thumbnailUrl":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.webp","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/","url":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/","name":"AI jailbreaks: What they are and how they can be mitigated | Microsoft Security Blog","isPartOf":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#primaryimage"},"image":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#primaryimage"},"thumbnailUrl":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.webp","datePublished":"2024-06-04T17:00:00+00:00","dateModified":"2024-06-25T23:16:37+00:00","breadcrumb":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#primaryimage","url":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.webp","contentUrl":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2024\/06\/AI-jailbreak-featured.webp","width":1200,"height":800,"caption":"Photo of developer evaluating data from intelligent apps built in Azure"},{"@type":"BreadcrumbList","@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/"},{"@type":"ListItem","position":2,"name":"AI jailbreaks: What they are and how they can be mitigated"}]},{"@type":"WebSite","@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#website","url":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/","name":"Microsoft Security Blog","description":"Expert coverage of cybersecurity topics","publisher":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#organization","name":"Microsoft Security Blog","url":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2018\/08\/cropped-cropped-microsoft_logo_element.png","contentUrl":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2018\/08\/cropped-cropped-microsoft_logo_element.png","width":512,"height":512,"caption":"Microsoft Security Blog"},"image":{"@id":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/#\/schema\/logo\/image\/"}}]}},"msxcm_display_generated_audio":false,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Security Blog","distributor_original_site_url":"https:\/\/www.microsoft.com\/en-us\/security\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/posts\/134543","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/comments?post=134543"}],"version-history":[{"count":15,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/posts\/134543\/revisions"}],"predecessor-version":[{"id":134614,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/posts\/134543\/revisions\/134614"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/media\/134554"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/media?parent=134543"}],"wp:term":[{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/content-type?post=134543"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/topic?post=134543"},{"taxonomy":"products","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/products?post=134543"},{"taxonomy":"threat-intelligence","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/threat-intelligence?post=134543"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/tags?post=134543"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-json\/wp\/v2\/coauthors?post=134543"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}