{"id":134543,"date":"2024-06-04T10:00:00","date_gmt":"2024-06-04T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/?p=134543"},"modified":"2024-06-25T16:16:37","modified_gmt":"2024-06-25T23:16:37","slug":"ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/04\/ai-jailbreaks-what-they-are-and-how-they-can-be-mitigated\/","title":{"rendered":"AI jailbreaks: What they are and how they can be mitigated"},"content":{"rendered":"\n
Generative AI systems are made up of multiple components that interact to provide a rich user experience between the human and the AI model(s). As part of a responsible AI approach<\/a>, AI models are protected by layers of defense mechanisms to prevent the production of harmful content or being used to carry out instructions that go against the intended purpose of the AI integrated application. This blog will provide an understanding of what AI jailbreaks are, why generative AI is susceptible to them, and how you can mitigate the risks and harms.<\/p>\n\n\n\n An AI jailbreak is a technique<\/em> that can cause the failure of guardrails (mitigations<\/em>). The resulting harm<\/em> comes from whatever guardrail was circumvented: for example, causing the system to violate its operators\u2019 policies, make decisions unduly influenced by one user, or execute malicious instructions. This technique<\/em> may be associated with additional attack techniques<\/em> such as prompt injection, evasion, and model manipulation. You can learn more about AI jailbreak techniques in our AI red team\u2019s Microsoft Build session, How Microsoft Approaches AI Red Teaming<\/a>.<\/p>\n\n\nWhat is AI jailbreak?<\/h2>\n\n\n\n