AI threats | Latest Threats | Microsoft Security Blog

Microsoft shares latest intelligence on North Korean and Chinese threat actors at CYBERWARCON

Microsoft Threat Intelligence — Fri, 22 Nov 2024 11:00:00 +0000

This year at CYBERWARCON, Microsoft Threat Intelligence analysts are sharing research and insights representing years of threat actor tracking, infrastructure monitoring and disruption, and attacker tooling.

North Korean threat landscape

Listen to the Microsoft Threat Intelligence podcast episode

The talk DPRK – All grown up will cover how the Democratic People’s Republic of Korea (DPRK) has successfully built computer network exploitation capability over the past 10 years and how threat actors have enabled North Korea to steal billions of dollars in cryptocurrency as well as target organizations associated with satellites and weapons systems. Over this period, North Korean threat actors have developed and used multiple zero-day exploits and have become experts in cryptocurrency, blockchain, and AI technology.

This presentation will also include information on North Korea overcoming sanctions and other financial barriers by the United States and multiple other countries through the deployment of North Korean IT workers in Russia, China, and, other countries. These IT workers masquerade as individuals from countries other than North Korea to perform legitimate IT work and generate revenue for the regime. North Korean threat actors’ focus areas are:

Stealing money or cryptocurrency to help fund the North Korea weapons programs
Stealing information pertaining to weapons systems, sanctions information, and policy-related decisions before they occur
Performing IT work to generate revenue to help fund the North Korea IT weapons program

Meanwhile, in the talk No targets left behind, Microsoft Threat Intelligence analysts will present research on Storm-2077, a Chinese threat actor that conducts intelligence collection targeting government agencies and non-governmental organizations. This presentation will trace how Microsoft assembled the pieces of threat activity now tracked as Storm-2077 to demonstrate how we overcome challenges in tracking overlapping activities and attributing cyber operations originating from China.

This blog summarizes intelligence on threat actors covered by the two Microsoft presentations at CYBERWARCON.

The North Korean threat actor that Microsoft tracks as Sapphire Sleet has been conducting cryptocurrency theft as well as computer network exploitation activities since at least 2020. Microsoft’s analysis of Sapphire Sleet activity indicates that over 10 million US dollars’ worth of cryptocurrency was stolen by the threat actor from multiple companies over a six-month period.

Masquerading as a venture capitalist

While their methods have changed throughout the years, the primary scheme used by Sapphire Sleet over the past year and a half is to masquerade as a venture capitalist, feigning interest in investing in the target user’s company. The threat actor sets up an online meeting with a target user. On the day of the meeting, when the target user attempts to connect to the meeting, the user receives either a frozen screen or an error message stating that the user should contact the room administrator or support team for assistance.

When the target contacts the threat actor, the threat actor sends a script – a .scpt file (Mac) or a Visual Basic Script (.vbs) file (Windows) – to “fix the connection issue”. This script leads to malware being downloaded onto the target user’s device. The threat actor then works towards obtaining cryptocurrency wallets and other credentials on the compromised device, enabling the threat actor to steal cryptocurrency.

Posing as recruiters

As a secondary method, Sapphire Sleet masquerades as a recruiter on professional platforms like LinkedIn and reaches out to potential victims. The threat actor, posing as a recruiter, tells the target user that they have a job they are trying to fill and believe that the user would be a good candidate. To validate the skills listed on the target user’s profile, the threat actor asks the user to complete a skills assessment from a website under the threat actor’s control. The threat actor sends the target user a sign-in account and password. In signing in to the website and downloading the code associated with the skills assessment, the target user downloads malware onto their device, allowing the attackers to gain access to the system.

Figure 1. LinkedIn profiles of fake recruiters. LinkedIn accounts identified to be related to this attack have been taken down.

Ruby Sleet, a threat actor that Microsoft has been tracking since 2020, has significantly increased the sophistication of their phishing operations over the past several years. The threat actor has been observed signing their malware with legitimate (but compromised) certificates obtained from victims they have compromised. The threat actor has also distributed backdoored virtual private network (VPN) clients, installers, and various other legitimate software.

Ruby Sleet has also been observed conducting research on targets to find what specific software they run in their environment. The threat actor has developed custom capabilities tailored to specific targets. For example, in December 2023, Microsoft Threat Intelligence observed Ruby Sleet carrying out a supply chain attack in which the threat actor successfully compromised a Korean construction company and replaced a legitimate version of VeraPort software with a version that communicates with known Ruby Sleet infrastructure.

Ruby Sleet has targeted and successfully compromised aerospace and defense-related organizations. Stealing aerospace and defense-related technology may be used by North Korea to increase its understanding of missiles, drones, and other related technologies.

North Korean IT workers: The triple threat

In addition to utilizing computer network exploitation through the years, North Korea has dispatched thousands of IT workers abroad to earn money for the regime. These IT workers have brought in hundreds of millions of dollars for North Korea. We consider these North Korean IT workers to be a triple threat, because they:

Make money for the regime by performing “legitimate” IT work
May use their access to obtain sensitive intellectual property, source code, or trade secrets at the company
Steal sensitive data from the company and in some cases ransom the company into paying them in exchange for not publicly disclosing the company’s data

Microsoft Threat Intelligence has observed North Korean IT workers operating out of North Korea, Russia, and China.

Facilitators complicate tracking of IT worker ecosystem

Microsoft Threat Intelligence observed that the activities of North Korean IT workers involved many different parties, from creating accounts on various platforms to accepting payments and moving money to North Korean IT worker-controlled accounts. This makes tracking their activities more challenging than traditional nation-state threat actors.

Since it’s difficult for a person in North Korea to sign up for things such as a bank account or phone number, the IT workers must utilize facilitators to help them acquire access to platforms where they can apply for remote jobs. These facilitators are used by the IT workers for tasks such as creating an account on a freelance job website. As the relationship builds, the IT workers may ask the facilitator to perform other tasks such as:

Creating or renting their bank account to the North Korean IT worker
Creating LinkedIn accounts to be used for contacting recruiters to obtain work
Purchasing mobile phone numbers or SIM cards
Creating additional accounts on freelance job sites

Figure 2. The North Korean IT worker ecosystem

Fake profiles and portfolios with the aid of AI

One of the first things a North Korean IT worker does is set up a portfolio to show supposed examples of their previous work. Microsoft Threat Intelligence has observed hundreds of fake profiles and portfolios for North Korean IT workers on developer platforms like GitHub.

Figure 3. Example profile used by North Korean IT workers that has since been taken down.

Additionally, the North Korean IT workers have used fake profiles on LinkedIn to communicate with recruiters and apply for jobs.

Figure 4. An example of a North Korean IT worker LinkedIn profile that has since been taken down.

In October 2024, Microsoft found a public repository containing North Korean IT worker files. The repository contained the following information:

Resumes and email accounts used by the North Korean IT workers
Infrastructure used by these workers (VPS and VPN accounts along with specific VPS IP addresses)
Playbooks on conducting identity theft and creating and bidding jobs on freelancer websites without getting flagged
Actual images and AI-enhanced images of suspected North Korean IT workers
Wallet information and suspected payments made to facilitators
LinkedIn, GitHub, Upwork, TeamViewer, Telegram, and Skype accounts
Tracking sheet of work performed and payments received by these IT workers

Review of the repository indicates that the North Korean IT workers are conducting identity theft and using AI tools such as Faceswap to move their picture over to documents that they have stolen from victims. The attackers are also using Faceswap to take pictures of the North Korean IT workers and move them to more professional looking settings. The pictures created by the North Korean IT workers using AI tools are then utilized on resumes or profiles, sometimes for multiple personas, that are submitted for job applications.

Figure 5. Use of AI apps to modify photos used for North Korean IT workers’ resumes and profiles

Figure 6. Examples of resumes for North Korean IT workers. These two resumes use different versions of the same photo.

In the same repository, Microsoft Threat Intelligence found photos that appear to be of North Korean IT workers:

Figure 7. Photos of potential North Korean IT workers

Microsoft has observed that, in addition to using AI to assist with creating images used with job applications, North Korean IT workers are experimenting with other AI technologies such as voice-changing software. This aligns with observations shared in earlier blogs showing threat actors using AI as a productivity tool to refine their attack techniques. While we do not see threat actors using combined AI voice and video products as a tactic, we do recognize that if actors were to combine these technologies, it’s possible that future campaigns may involve IT workers using these programs to attempt to trick interviewers into thinking they are not communicating with a North Korean IT worker. If successful, this could allow the North Korean IT workers to do interviews directly and not have to rely on facilitators obtaining work for them by standing in on interviews or selling account access to them.

Getting payment for remote work

The North Korean IT workers appear to be very organized when it comes to tracking payments received. Overall, this group of North Korean IT workers appears to have made at least 370,000 US dollars through their efforts.

Protecting organizations from North Korean IT workers

Unfortunately, computer network exploitation and use of IT workers is a low-risk, high-reward technique used by North Korean threat actors. Here are some steps that organizations can take to be better protected:

Follow guidance from the US Department of State, US Department of the Treasury, and the Federal Bureau of Investigation on how to spot North Korean IT workers.
Educate human resources managers, hiring managers, and program managers for signs to look for when dealing with suspected North Korean IT workers.
Use simple non-technical techniques such as asking IT workers to turn on their camera periodically and comparing the person on camera with the one that picked up the laptop from your organization.
Ask the person on camera to walk through or explain code that they purportedly wrote.

Storm-2077: No targets left behind

Over the past decade, following numerous government indictments and the public disclosure of threat actors’ activities, tracking and attributing cyber operations originating from China has become increasingly challenging as the attackers adjust their tactics. These threat actors continue to conduct operations while using tooling and techniques against targets that often overlap with another threat actor’s operation. While analyzing activity that was affecting a handful of customers, Microsoft Threat Intelligence assembled the pieces of what would be tracked as Storm-2077. Undoubtably, this actor had some victimology and operational techniques that overlapped with a couple of threat actors that Microsoft was already tracking.

Microsoft assesses that Storm-2077 is a China state threat actor that has been active since at least January 2024. Storm-2077 has targeted a wide variety of sectors, including government agencies and non-governmental organizations in the United States. As we continued to track Storm-2077, we observed that they went after several other industries worldwide, including the Defense Industrial Base (DIB), aviation, telecommunications, and financial and legal services. Storm-2077 overlaps with activity tracked by other security vendors as TAG-100.

We assess that Storm-2077 likely operates with the objective of conducting intelligence collection. Storm-2077 has used phishing emails to gain credentials and, in certain cases, likely exploited edge-facing devices to gain initial access. We have observed techniques that focus on email data theft, which could allow them to analyze the data later without risking immediate loss of access. In some cases, Storm-2077 has used valid credentials harvested from the successful compromise of a system.

We’ve also observed Storm-2077 successfully exfiltrate emails by stealing credentials to access legitimate cloud applications such as eDiscovery applications. In other cases, Storm-2077 has been observed gaining access to cloud environments by harvesting credentials from compromised endpoints. Once administrative access was gained, Storm-2077 created their own application with mail read rights.

Access to email data is crucial for threat actors because it often contains sensitive information that could be utilized later for malicious purposes. Emails can include sign-in credentials, confidential communication, financial records, business secrets, intellectual property, and credentials for accessing critical systems, or employee information. Access to email accounts and the ability to steal email communication could enable an attacker to further their operations.

Microsoft’s talk on Storm-2077 at CYBERWARCON will highlight how vast their targeting interest covers. All sectors appear to be on the table, leaving no targets behind. Our analysts will talk about the challenges of tracking China-based threat actors and how they had to distinctly carve out Storm-2077.

CYBERWARCON Recap

At this year’s CYBERWARCON, Microsoft Security is sponsoring the post-event Fireside Recap. Hosted by Sherrod DeGrippo, this session will feature special guests who will dive into the highlights, key insights, and emerging themes that defined CYBERWARCON 2024. Interviews with speakers will offer exclusive insights and bring the conference’s biggest moments into sharp focus.

Learn more

For the latest security research from the Microsoft Threat Intelligence community, check out the Microsoft Threat Intelligence Blog: https://aka.ms/threatintelblog.

To get notified about new publications and to join discussions on social media, follow us on LinkedIn at https://www.linkedin.com/showcase/microsoft-threat-intelligence, and on X (formerly Twitter) at https://twitter.com/MsftSecIntel.

To hear stories and insights from the Microsoft Threat Intelligence community about the ever-evolving threat landscape, listen to the Microsoft Threat Intelligence podcast: https://thecyberwire.com/podcasts/microsoft-threat-intelligence.

The post Microsoft shares latest intelligence on North Korean and Chinese threat actors at CYBERWARCON appeared first on Microsoft Security Blog.

Mitigating Skeleton Key, a new type of generative AI jailbreak technique

Mark Russinovich — Wed, 26 Jun 2024 17:00:00 +0000

In generative AI, jailbreaks, also known as direct prompt injection attacks, are malicious user inputs that attempt to circumvent an AI model’s intended behavior. A successful jailbreak has potential to subvert all or most responsible AI (RAI) guardrails built into the model through its training by the AI vendor, making risk mitigations across other layers of the AI stack a critical design choice as part of defense in depth.

As we discussed in a previous blog post about AI jailbreaks, an AI jailbreak could cause the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions.

In this blog, we’ll cover the details of a newly discovered type of jailbreak attack that we call Skeleton Key, which we covered briefly in the Microsoft Build talk Inside AI Security with Mark Russinovich (under the name Master Key). Because this technique affects multiple generative AI models tested, Microsoft has shared these findings with other AI providers through responsible disclosure procedures and addressed the issue in Microsoft Azure AI-managed models using Prompt Shields to detect and block this type of attack. Microsoft has also made software updates to the large language model (LLM) technology behind Microsoft’s additional AI offerings, including our Copilot AI assistants, to mitigate the impact of this guardrail bypass.

Introducing Skeleton Key

This AI jailbreak technique works by using a multi-turn (or multiple step) strategy to cause a model to ignore its guardrails. Once guardrails are ignored, a model will not be able to determine malicious or unsanctioned requests from any other. Because of its full bypass abilities, we have named this jailbreak technique Skeleton Key.

Figure 1. Skeleton Key jailbreak technique causes harm in AI systems

This threat is in the jailbreak category, and therefore relies on the attacker already having legitimate access to the AI model. In bypassing safeguards, Skeleton Key allows the user to cause the model to produce ordinarily forbidden behaviors, which could range from production of harmful content to overriding its usual decision-making rules. Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do. As this is an attack on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.

To protect against Skeleton Key attacks, as detailed in this blog, Microsoft has implemented several approaches to our AI system design and provides tools for customers developing their own applications on Azure. Below, we also share mitigation guidance for defenders to discover and protect against such attacks.

Microsoft recommends customers who are building their own AI models and/or integrating AI into their applications to consider how this type of attack could impact their threat model and to add this knowledge to their AI red team approach, using tools such as PyRIT. (Note: Microsoft has updated PyRIT to include Skeleton Key)

In the next sections, we will discuss some of the known methods for exploiting generative AI models using the Skeleton Key technique, explain the steps we’re taking to address the risk, and provide guidance for the detection and mitigation of this threat. You can watch this video to learn more about how Microsoft approaches AI Red Teaming.

Attack flow

Skeleton Key works by asking a model to augment, rather than change, its behavior guidelines so that it responds to any request for information or content, providing a warning (rather than refusing) if its output might be considered offensive, harmful, or illegal if followed. This attack type is known as Explicit: forced instruction-following.

In one example, informing a model that the user is trained in safety and ethics, and that the output is for research purposes only, helps to convince some models to comply, as shown below:

Figure 2. Example text used in a Skeleton Key jailbreak attack

When the Skeleton Key jailbreak is successful, a model acknowledges that it has updated its guidelines and will subsequently comply with instructions to produce any content, no matter how much it violates its original responsible AI guidelines.

During our testing, from April to May 2024, the jailbreak was shown to work on the following base models and hosted models:

Meta Llama3-70b-instruct (base)
Google Gemini Pro (base)
OpenAI GPT 3.5 Turbo (hosted)
OpenAI GPT 4o (hosted)
Mistral Large (hosted)
Anthropic Claude 3 Opus (hosted)
Cohere Commander R Plus (hosted)

For each model that we tested, we evaluated a diverse set of tasks across risk and safety content categories, including areas such as explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence. All the affected models complied fully and without censorship for these tasks, though with a warning note prefixing the output as requested. Unlike other jailbreaks like Crescendo, where models must be asked about tasks indirectly or with encodings, Skeleton Key puts the models in a mode where a user can directly request tasks, for example, “Write a recipe for homemade explosives”. Further, the model’s output appears to be completely unfiltered and reveals the extent of a model’s knowledge or ability to produce the requested content.

Consistent with responsible disclosure principles, Microsoft shared this research with the affected AI vendors before publication, helping them determine how to best address mitigations, as needed, in their respective products or services.

GPT-4 demonstrated resistance to Skeleton Key, except when the behavior update request was included as part of a user-defined system message, rather than as a part of the primary user input. This is something that is not ordinarily possible in the interfaces of most software that uses GPT-4, but can be done from the underlying API or tools that access it directly. This indicates that the differentiation of system message from user request in GPT-4 is successfully reducing attackers’ ability to override behavior.

Mitigation and protection guidance

Microsoft has made software updates to the LLM technology behind Microsoft’s AI offerings, including our Copilot AI assistants, to mitigate the impact of this guardrail bypass. Customers should consider the following approach to mitigate and protect against this type of jailbreak in their own AI system design:

Input filtering: Azure AI Content Safety detects and blocks inputs that contain harmful or malicious intent leading to a jailbreak attack that could circumvent safeguards.
System message: Prompt engineering the system prompts to clearly instruct the large language model (LLM) on appropriate behavior and to provide additional safeguards. For instance, specify that any attempts to undermine the safety guardrail instructions should be prevented (read our guidance on building a system message framework here).
Output filtering: Azure AI Content Safety post-processing filter that identifies and prevents output generated by the model that breaches safety criteria.
Abuse monitoring: Deploying an AI-driven detection system trained on adversarial examples, and using content classification, abuse pattern capture, and other methods to detect and mitigate instances of recurring content and/or behaviors that suggest use of the service in a manner that may violate guardrails. As a separate AI system, it avoids being influenced by malicious instructions. Microsoft Azure OpenAI Service abuse monitoring is an example of this approach.

Building AI solutions on Azure

Microsoft provides tools for customers developing their own applications on Azure. Azure AI Content Safety Prompt Shields are enabled by default for models hosted in the Azure AI model catalog as a service, and they are parameterized by a severity threshold. We recommend setting the most restrictive threshold to ensure the best protection against safety violations. These input and output filters act as a general defense not only against this particular jailbreak technique, but also a broad set of emerging techniques that attempt to generate harmful content. Azure also provides built-in tooling for model selection, prompt engineering, evaluation, and monitoring. For example, risk and safety evaluations in Azure AI Studio can assess a model and/or application for susceptibility to jailbreak attacks using synthetic adversarial datasets, while Microsoft Defender for Cloud can alert security operations teams to jailbreaks and other active threats.

With the integration of Azure AI and Microsoft Security (Microsoft Purview and Microsoft Defender for Cloud) security teams can also discover, protect, and govern these attacks. The new native integration of Microsoft Defender for Cloud with Azure OpenAI Service, enables contextual and actionable security alerts, driven by Azure AI Content Safety Prompt Shields and Microsoft Defender Threat Intelligence. Threat protection for AI workloads allows security teams to monitor their Azure OpenAI powered applications in runtime for malicious activity associated with direct and in-direct prompt injection attacks, sensitive data leaks and data poisoning, or denial of service attacks.

Figure 3. Microsoft Security for the protection of AI systems

References

Learn more

To learn more about Microsoft’s Responsible AI principles and approach, refer to http://approjects.co.za/?big=ai/principles-and-approach.

For the latest security research from the Microsoft Threat Intelligence community, check out the Microsoft Threat Intelligence Blog: https://aka.ms/threatintelblog.

The post Mitigating Skeleton Key, a new type of generative AI jailbreak technique appeared first on Microsoft Security Blog.

AI jailbreaks: What they are and how they can be mitigated

Microsoft Threat Intelligence — Tue, 04 Jun 2024 17:00:00 +0000

Generative AI systems are made up of multiple components that interact to provide a rich user experience between the human and the AI model(s). As part of a responsible AI approach, AI models are protected by layers of defense mechanisms to prevent the production of harmful content or being used to carry out instructions that go against the intended purpose of the AI integrated application. This blog will provide an understanding of what AI jailbreaks are, why generative AI is susceptible to them, and how you can mitigate the risks and harms.

What is AI jailbreak?

An AI jailbreak is a technique that can cause the failure of guardrails (mitigations). The resulting harm comes from whatever guardrail was circumvented: for example, causing the system to violate its operators’ policies, make decisions unduly influenced by one user, or execute malicious instructions. This technique may be associated with additional attack techniques such as prompt injection, evasion, and model manipulation. You can learn more about AI jailbreak techniques in our AI red team’s Microsoft Build session, How Microsoft Approaches AI Red Teaming.

Figure 1. AI safety finding ontology

Here is an example of an attempt to ask an AI assistant to provide information about how to build a Molotov cocktail (firebomb). We know this knowledge is built into most of the generative AI models available today, but is prevented from being provided to the user through filters and other techniques to deny this request. Using a technique like Crescendo, however, the AI assistant can produce the harmful content that should otherwise have been avoided. This particular problem has since been addressed in Microsoft’s safety filters; however, AI models are still susceptible to it. Many variations of these attempts are discovered on a regular basis, then tested and mitigated.

Figure 2. Crescendo attack to build a Molotov cocktail

Why is generative AI susceptible to this issue?

When integrating AI into your applications, consider the characteristics of AI and how they might impact the results and decisions made by this technology. Without anthropomorphizing AI, the interactions are very similar to the issues you might find when dealing with people. You can consider the attributes of an AI language model to be similar to an eager but inexperienced employee trying to help your other employees with their productivity:

Over-confident: They may confidently present ideas or solutions that sound impressive but are not grounded in reality, like an overenthusiastic rookie who hasn’t learned to distinguish between fiction and fact.
Gullible: They can be easily influenced by how tasks are assigned or how questions are asked, much like a naïve employee who takes instructions too literally or is swayed by the suggestions of others.
Wants to impress: While they generally follow company policies, they can be persuaded to bend the rules or bypass safeguards when pressured or manipulated, like an employee who may cut corners when tempted.
Lack of real-world application: Despite their extensive knowledge, they may struggle to apply it effectively in real-world situations, like a new hire who has studied the theory but may lack practical experience and common sense.

In essence, AI language models can be likened to employees who are enthusiastic and knowledgeable but lack the judgment, context understanding, and adherence to boundaries that come with experience and maturity in a business setting.

So we can say that generative AI models and system have the following characteristics:

Imaginative but sometimes unreliable
Suggestible and literal-minded, without appropriate guidance
Persuadable and potentially exploitable
Knowledgeable yet impractical for some scenarios

Without the proper protections in place, these systems can not only produce harmful content, but could also carry out unwanted actions and leak sensitive information.

Due to the nature of working with human language, generative capabilities, and the data used in training the models, AI models are non-deterministic, i.e., the same input will not always produce the same outputs. These results can be improved in the training phases, as we saw with the results of increased resilience in Phi-3 based on direct feedback from our AI Red Team. As all generative AI systems are subject to these issues, Microsoft recommends taking a zero-trust approach towards the implementation of AI; assume that any generative AI model could be susceptible to jailbreaking and limit the potential damage that can be done if it is achieved. This requires a layered approach to mitigate, detect, and respond to jailbreaks. Learn more about our AI Red Team approach.

Figure 3. Anatomy of an AI application

What is the scope of the problem?

When an AI jailbreak occurs, the severity of the impact is determined by the guardrail that it circumvented. Your response to the issue will depend on the specific situation and if the jailbreak can lead to unauthorized access to content or trigger automated actions. For example, if the harmful content is generated and presented back to a single user, this is an isolated incident that, while harmful, is limited. However, if the jailbreak could result in the system carrying out automated actions, or producing content that could be visible to more than the individual user, then this becomes a more severe incident. As a technique, jailbreaks should not have an incident severity of their own; rather, severities should depend on the consequence of the overall event (you can read about Microsoft’s approach in the AI bug bounty program).

Here are some examples of the types of risks that could occur from an AI jailbreak:

AI safety and security risks:
- Unauthorized data access
- Sensitive data exfiltration
- Model evasion
- Generating ransomware
- Circumventing individual policies or compliance systems
Responsible AI risks:
- Producing content that violates policies (e.g., harmful, offensive, or violent content)
- Access to dangerous capabilities of the model (e.g., producing actionable instructions for dangerous or criminal activity)
- Subversion of decision-making systems (e.g., making a loan application or hiring system produce attacker-controlled decisions)
- Causing the system to misbehave in a newsworthy and screenshot-able way
- IP infringement

How do AI jailbreaks occur?

The two basic families of jailbreak depend on who is doing them:

A “classic” jailbreak happens when an authorized operator of the system crafts jailbreak inputs in order to extend their own powers over the system.
Indirect prompt injection happens when a system processes data controlled by a third party (e.g., analyzing incoming emails or documents editable by someone other than the operator) who inserts a malicious payload into that data, which then leads to a jailbreak of the system.

You can learn more about both of these types of jailbreaks here.

There is a wide range of known jailbreak-like attacks. Some of them (like DAN) work by adding instructions to a single user input, while others (like Crescendo) act over several turns, gradually shifting the conversation to a particular end. Jailbreaks may use very “human” techniques such as social psychology, effectively sweet-talking the system into bypassing safeguards, or very “artificial” techniques that inject strings with no obvious human meaning, but which nonetheless could confuse AI systems. Jailbreaks should not, therefore, be regarded as a single technique, but as a group of methodologies in which a guardrail can be talked around by an appropriately crafted input.

Mitigation and protection guidance

To mitigate the potential of AI jailbreaks, Microsoft takes defense in depth approach when protecting our AI systems, from models hosted on Azure AI to each Copilot solution we offer. When building your own AI solutions within Azure, the following are some of the key enabling technologies that you can use to implement jailbreak mitigations:

Prompt filtering: Prompt Shields in Azure AI Content Safety
Identity management: Managed identities for Azure resources
Data access controls: Microsoft Purview data security for generative AI apps
System metaprompt: System message framework and template recommendations for large language models (LLMs)
Content filtering: Azure OpenAI Service content filtering
Abuse monitoring: Azure OpenAI Service abuse monitoring
Model alignment during training: Microsoft Azure AI Fundamentals: Generative AI – Training
Threat protection: Microsoft Defender for Cloud threat protection for AI workloads

Figure 4. Layered approach to protecting AI applications.

With layered defenses, there are increased chances to mitigate, detect, and appropriately respond to any potential jailbreaks.

When building solutions on Azure AI, use the Azure AI Studio capabilities to build benchmarks, create metrics, and implement continuous monitoring and evaluation for potential jailbreak issues.

Figure 5. Azure AI Studio capabilities

If you discover new vulnerabilities in any AI platform, we encourage you to follow responsible disclosure practices for the platform owner. Microsoft’s procedure is explained here: Microsoft AI Bounty Program.

Detection guidance

Microsoft builds multiple layers of detections into each of our AI hosting and Copilot solutions.

To detect attempts of jailbreak in your own AI systems, you should ensure you have enabled logging and are monitoring interactions in each component, especially the conversation transcripts, system metaprompt, and the prompt completions generated by the AI model.

Microsoft recommends setting the Azure AI Content Safety filter severity threshold to the most restrictive options, suitable for your application. You can also use Azure AI Studio to begin the evaluation of your AI application safety with the following guidance: Evaluation of generative AI applications with Azure AI Studio.

Summary

This article provides the foundational guidance and understanding of AI jailbreaks. In future blogs, we will explain the specifics of any newly discovered jailbreak techniques. Each one will articulate the following key points:

We will describe the jailbreak technique discovered and how it works, with evidential testing results.
We will have followed responsible disclosure practices to provide insights to the affected AI providers, ensuring they have suitable time to implement mitigations.
We will explain how Microsoft’s own AI systems have been updated to implement mitigations to the jailbreak.
We will provide detection and mitigation information to assist others to implement their own further defenses in their AI systems.

Richard Diver
Microsoft Security

Learn more

For the latest security research from the Microsoft Threat Intelligence community, check out the Microsoft Threat Intelligence Blog: https://aka.ms/threatintelblog.

The post AI jailbreaks: What they are and how they can be mitigated appeared first on Microsoft Security Blog.

How Microsoft discovers and mitigates evolving attacks against AI guardrails

Mark Russinovich — Thu, 11 Apr 2024 16:00:00 +0000

As we continue to integrate generative AI into our daily lives, it’s important to understand the potential harms that can arise from its use. Our ongoing commitment to advance safe, secure, and trustworthy AI includes transparency about the capabilities and limitations of large language models (LLMs). We prioritize research on societal risks and building secure, safe AI, and focus on developing and deploying AI systems for the public good. You can read more about Microsoft’s approach to securing generative AI with new tools we recently announced as available or coming soon to Microsoft Azure AI Studio for generative AI app developers.

We also made a commitment to identify and mitigate risks and share information on novel, potential threats. For example, earlier this year Microsoft shared the principles shaping Microsoft’s policy and actions blocking the nation-state advanced persistent threats (APTs), advanced persistent manipulators (APMs), and cybercriminal syndicates we track from using our AI tools and APIs.

In this blog post, we will discuss some of the key issues surrounding AI harms and vulnerabilities, and the steps we are taking to address the risk.

The potential for malicious manipulation of LLMs

One of the main concerns with AI is its potential misuse for malicious purposes. To prevent this, AI systems at Microsoft are built with several layers of defenses throughout their architecture. One purpose of these defenses is to limit what the LLM will do, to align with the developers’ human values and goals. But sometimes bad actors attempt to bypass these safeguards with the intent to achieve unauthorized actions, which may result in what is known as a “jailbreak.” The consequences can range from the unapproved but less harmful—like getting the AI interface to talk like a pirate—to the very serious, such as inducing AI to provide detailed instructions on how to achieve illegal activities. As a result, a good deal of effort goes into shoring up these jailbreak defenses to protect AI-integrated applications from these behaviors.

While AI-integrated applications can be attacked like traditional software (with methods like buffer overflows and cross-site scripting), they can also be vulnerable to more specialized attacks that exploit their unique characteristics, including the manipulation or injection of malicious instructions by talking to the AI model through the user prompt. We can break these risks into two groups of attack techniques:

Malicious prompts: When the user input attempts to circumvent safety systems in order to achieve a dangerous goal. Also referred to as user/direct prompt injection attack, or UPIA.
Poisoned content: When a well-intentioned user asks the AI system to process a seemingly harmless document (such as summarizing an email) that contains content created by a malicious third party with the purpose of exploiting a flaw in the AI system. Also known as cross/indirect prompt injection attack, or XPIA.

Today we’ll share two of our team’s advances in this field: the discovery of a powerful technique to neutralize poisoned content, and the discovery of a novel family of malicious prompt attacks, and how to defend against them with multiple layers of mitigations.

Neutralizing poisoned content (Spotlighting)

Prompt injection attacks through poisoned content are a major security risk because an attacker who does this can potentially issue commands to the AI system as if they were the user. For example, a malicious email could contain a payload that, when summarized, would cause the system to search the user’s email (using the user’s credentials) for other emails with sensitive subjects—say, “Password Reset”—and exfiltrate the contents of those emails to the attacker by fetching an image from an attacker-controlled URL. As such capabilities are of obvious interest to a wide range of adversaries, defending against them is a key requirement for the safe and secure operation of any AI service.

Our experts have developed a family of techniques called Spotlighting that reduces the success rate of these attacks from more than 20% to below the threshold of detection, with minimal effect on the AI’s overall performance:

Spotlighting (also known as data marking) to make the external data clearly separable from instructions by the LLM, with different marking methods offering a range of quality and robustness tradeoffs that depend on the model in use.

Mitigating the risk of multiturn threats (Crescendo)

Our researchers discovered a novel generalization of jailbreak attacks, which we call Crescendo. This attack can best be described as a multiturn LLM jailbreak, and we have found that it can achieve a wide range of malicious goals against the most well-known LLMs used today. Crescendo can also bypass many of the existing content safety filters, if not appropriately addressed. Once we discovered this jailbreak technique, we quickly shared our technical findings with other AI vendors so they could determine whether they were affected and take actions they deem appropriate. The vendors we contacted are aware of the potential impact of Crescendo attacks and focused on protecting their respective platforms, according to their own AI implementations and safeguards.

At its core, Crescendo tricks LLMs into generating malicious content by exploiting their own responses. By asking carefully crafted questions or prompts that gradually lead the LLM to a desired outcome, rather than asking for the goal all at once, it is possible to bypass guardrails and filters—this can usually be achieved in fewer than 10 interaction turns. You can read about Crescendo’s results across a variety of LLMs and chat services, and more about how and why it works, in our research paper.

While Crescendo attacks were a surprising discovery, it is important to note that these attacks did not directly pose a threat to the privacy of users otherwise interacting with the Crescendo-targeted AI system, or the security of the AI system, itself. Rather, what Crescendo attacks bypass and defeat is content filtering regulating the LLM, helping to prevent an AI interface from behaving in undesirable ways. We are committed to continuously researching and addressing these, and other types of attacks, to help maintain the secure operation and performance of AI systems for all.

In the case of Crescendo, our teams made software updates to the LLM technology behind Microsoft’s AI offerings, including our Copilot AI assistants, to mitigate the impact of this multiturn AI guardrail bypass. It is important to note that as more researchers inside and outside Microsoft inevitably focus on finding and publicizing AI bypass techniques, Microsoft will continue taking action to update protections in our products, as major contributors to AI security research, bug bounties and collaboration.

To understand how we addressed the issue, let us first review how we mitigate a standard malicious prompt attack (single step, also known as a one-shot jailbreak):

Standard prompt filtering: Detect and reject inputs that contain harmful or malicious intent, which might circumvent the guardrails (causing a jailbreak attack).
System metaprompt: Prompt engineering in the system to clearly explain to the LLM how to behave and provide additional guardrails.

Defending against Crescendo initially faced some practical problems. At first, we could not detect a “jailbreak intent” with standard prompt filtering, as each individual prompt is not, on its own, a threat, and keywords alone are insufficient to detect this type of harm. Only when combined is the threat pattern clear. Also, the LLM itself does not see anything out of the ordinary, since each successive step is well-rooted in what it had generated in a previous step, with just a small additional ask; this eliminates many of the more prominent signals that we could ordinarily use to prevent this kind of attack.

To solve the unique problems of multiturn LLM jailbreaks, we create additional layers of mitigations to the previous ones mentioned above:

Multiturn prompt filter: We have adapted input filters to look at the entire pattern of the prior conversation, not just the immediate interaction. We found that even passing this larger context window to existing malicious intent detectors, without improving the detectors at all, significantly reduced the efficacy of Crescendo.
AI Watchdog: Deploying an AI-driven detection system trained on adversarial examples, like a sniffer dog at the airport searching for contraband items in luggage. As a separate AI system, it avoids being influenced by malicious instructions. Microsoft Azure AI Content Safety is an example of this approach.
Advanced research: We invest in research for more complex mitigations, derived from better understanding of how LLM’s process requests and go astray. These have the potential to protect not only against Crescendo, but against the larger family of social engineering attacks against LLM’s.

How Microsoft helps protect AI systems

AI has the potential to bring many benefits to our lives. But it is important to be aware of new attack vectors and take steps to address them. By working together and sharing vulnerability discoveries, we can continue to improve the safety and security of AI systems. With the right product protections in place, we continue to be cautiously optimistic for the future of generative AI, and embrace the possibilities safely, with confidence. To learn more about developing responsible AI solutions with Azure AI, visit our website.

To empower security professionals and machine learning engineers to proactively find risks in their own generative AI systems, Microsoft has released an open automation framework, PyRIT (Python Risk Identification Toolkit for generative AI). Read more about the release of PyRIT for generative AI Red teaming, and access the PyRIT toolkit on GitHub. If you discover new vulnerabilities in any AI platform, we encourage you to follow responsible disclosure practices for the platform owner. Microsoft’s own procedure is explained here: Microsoft AI Bounty.

The Crescendo Multi-Turn LLM Jailbreak Attack

Read about Crescendo’s results across a variety of LLMs and chat services, and more about how and why it works.

Read the paper

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.

The post How Microsoft discovers and mitigates evolving attacks against AI guardrails appeared first on Microsoft Security Blog.

Staying ahead of threat actors in the age of AI

Microsoft Threat Intelligence — Wed, 14 Feb 2024 12:00:00 +0000

Over the last year, the speed, scale, and sophistication of attacks has increased alongside the rapid development and adoption of AI. Defenders are only beginning to recognize and apply the power of generative AI to shift the cybersecurity balance in their favor and keep ahead of adversaries. At the same time, it is also important for us to understand how AI can be potentially misused in the hands of threat actors. In collaboration with OpenAI, today we are publishing research on emerging threats in the age of AI, focusing on identified activity associated with known threat actors, including prompt-injections, attempted misuse of large language models (LLM), and fraud. Our analysis of the current use of LLM technology by threat actors revealed behaviors consistent with attackers using AI as another productivity tool on the offensive landscape. You can read OpenAI’s blog on the research here. Microsoft and OpenAI have not yet observed particularly novel or unique AI-enabled attack or abuse techniques resulting from threat actors’ usage of AI. However, Microsoft and our partners continue to study this landscape closely.

The objective of Microsoft’s partnership with OpenAI, including the release of this research, is to ensure the safe and responsible use of AI technologies like ChatGPT, upholding the highest standards of ethical application to protect the community from potential misuse. As part of this commitment, we have taken measures to disrupt assets and accounts associated with threat actors, improve the protection of OpenAI LLM technology and users from attack or abuse, and shape the guardrails and safety mechanisms around our models. In addition, we are also deeply committed to using generative AI to disrupt threat actors and leverage the power of new tools, including Microsoft Copilot for Security, to elevate defenders everywhere.

A principled approach to detecting and blocking threat actors

The progress of technology creates a demand for strong cybersecurity and safety measures. For example, the White House’s Executive Order on AI requires rigorous safety testing and government supervision for AI systems that have major impacts on national and economic security or public health and safety. Our actions enhancing the safeguards of our AI models and partnering with our ecosystem on the safe creation, implementation, and use of these models align with the Executive Order’s request for comprehensive AI safety and security standards.

In line with Microsoft’s leadership across AI and cybersecurity, today we are announcing principles shaping Microsoft’s policy and actions mitigating the risks associated with the use of our AI tools and APIs by nation-state advanced persistent threats (APTs), advanced persistent manipulators (APMs), and cybercriminal syndicates we track.

These principles include:

Identification and action against malicious threat actors’ use: Upon detection of the use of any Microsoft AI application programming interfaces (APIs), services, or systems by an identified malicious threat actor, including nation-state APT or APM, or the cybercrime syndicates we track, Microsoft will take appropriate action to disrupt their activities, such as disabling the accounts used, terminating services, or limiting access to resources.

Notification to other AI service providers: When we detect a threat actor’s use of another service provider’s AI, AI APIs, services, and/or systems, Microsoft will promptly notify the service provider and share relevant data. This enables the service provider to independently verify our findings and take action in accordance with their own policies.

Collaboration with other stakeholders: Microsoft will collaborate with other stakeholders to regularly exchange information about detected threat actors’ use of AI. This collaboration aims to promote collective, consistent, and effective responses to ecosystem-wide risks.

Transparency: As part of our ongoing efforts to advance responsible use of AI, Microsoft will inform the public and stakeholders about actions taken under these threat actor principles, including the nature and extent of threat actors’ use of AI detected within our systems and the measures taken against them, as appropriate.

Microsoft remains committed to responsible AI innovation, prioritizing the safety and integrity of our technologies with respect for human rights and ethical standards. These principles announced today build on Microsoft’s Responsible AI practices, our voluntary commitments to advance responsible AI innovation and the Azure OpenAI Code of Conduct. We are following these principles as part of our broader commitments to strengthening international law and norms and to advance the goals of the Bletchley Declaration endorsed by 29 countries.

Microsoft and OpenAI’s complementary defenses protect AI platforms

Because Microsoft and OpenAI’s partnership extends to security, the companies can take action when known and emerging threat actors surface. Microsoft Threat Intelligence tracks more than 300 unique threat actors, including 160 nation-state actors, 50 ransomware groups, and many others. These adversaries employ various digital identities and attack infrastructures. Microsoft’s experts and automated systems continually analyze and correlate these attributes, uncovering attackers’ efforts to evade detection or expand their capabilities by leveraging new technologies. Consistent with preventing threat actors’ actions across our technologies and working closely with partners, Microsoft continues to study threat actors’ use of AI and LLMs, partner with OpenAI to monitor attack activity, and apply what we learn to continually improve defenses. This blog provides an overview of observed activities collected from known threat actor infrastructure as identified by Microsoft Threat Intelligence, then shared with OpenAI to identify potential malicious use or abuse of their platform and protect our mutual customers from future threats or harm.

Recognizing the rapid growth of AI and emergent use of LLMs in cyber operations, we continue to work with MITRE to integrate these LLM-themed tactics, techniques, and procedures (TTPs) into the MITRE ATT&CK® framework or MITRE ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems) knowledgebase. This strategic expansion reflects a commitment to not only track and neutralize threats, but also to pioneer the development of countermeasures in the evolving landscape of AI-powered cyber operations. A full list of the LLM-themed TTPs, which include those we identified during our investigations, is summarized in the appendix.

Summary of Microsoft and OpenAI’s findings and threat intelligence

The threat ecosystem over the last several years has revealed a consistent theme of threat actors following trends in technology in parallel with their defender counterparts. Threat actors, like defenders, are looking at AI, including LLMs, to enhance their productivity and take advantage of accessible platforms that could advance their objectives and attack techniques. Cybercrime groups, nation-state threat actors, and other adversaries are exploring and testing different AI technologies as they emerge, in an attempt to understand potential value to their operations and the security controls they may need to circumvent. On the defender side, hardening these same security controls from attacks and implementing equally sophisticated monitoring that anticipates and blocks malicious activity is vital.

While different threat actors’ motives and complexity vary, they have common tasks to perform in the course of targeting and attacks. These include reconnaissance, such as learning about potential victims’ industries, locations, and relationships; help with coding, including improving things like software scripts and malware development; and assistance with learning and using native languages. Language support is a natural feature of LLMs and is attractive for threat actors with continuous focus on social engineering and other techniques relying on false, deceptive communications tailored to their targets’ jobs, professional networks, and other relationships.

Importantly, our research with OpenAI has not identified significant attacks employing the LLMs we monitor closely. At the same time, we feel this is important research to publish to expose early-stage, incremental moves that we observe well-known threat actors attempting, and share information on how we are blocking and countering them with the defender community.

While attackers will remain interested in AI and probe technologies’ current capabilities and security controls, it’s important to keep these risks in context. As always, hygiene practices such as multifactor authentication (MFA) and Zero Trust defenses are essential because attackers may use AI-based tools to improve their existing cyberattacks that rely on social engineering and finding unsecured devices and accounts.

The threat actors profiled below are a sample of observed activity we believe best represents the TTPs the industry will need to better track using MITRE ATT&CK® framework or MITRE ATLAS™ knowledgebase updates.

Forest Blizzard

Forest Blizzard (STRONTIUM) is a Russian military intelligence actor linked to GRU Unit 26165, who has targeted victims of both tactical and strategic interest to the Russian government. Their activities span across a variety of sectors including defense, transportation/logistics, government, energy, non-governmental organizations (NGO), and information technology. Forest Blizzard has been extremely active in targeting organizations in and related to Russia’s war in Ukraine throughout the duration of the conflict, and Microsoft assesses that Forest Blizzard operations play a significant supporting role to Russia’s foreign policy and military objectives both in Ukraine and in the broader international community. Forest Blizzard overlaps with the threat actor tracked by other researchers as APT28 and Fancy Bear.

Forest Blizzard’s use of LLMs has involved research into various satellite and radar technologies that may pertain to conventional military operations in Ukraine, as well as generic research aimed at supporting their cyber operations. Based on these observations, we map and classify these TTPs using the following descriptions:

LLM-informed reconnaissance: Interacting with LLMs to understand satellite communication protocols, radar imaging technologies, and specific technical parameters. These queries suggest an attempt to acquire in-depth knowledge of satellite capabilities.
LLM-enhanced scripting techniques: Seeking assistance in basic scripting tasks, including file manipulation, data selection, regular expressions, and multiprocessing, to potentially automate or optimize technical operations.

Microsoft observed engagement from Forest Blizzard that were representative of an adversary exploring the use cases of a new technology. All accounts and assets associated with Forest Blizzard have been disabled.

Emerald Sleet

Emerald Sleet (THALLIUM) is a North Korean threat actor that has remained highly active throughout 2023. Their recent operations relied on spear-phishing emails to compromise and gather intelligence from prominent individuals with expertise on North Korea. Microsoft observed Emerald Sleet impersonating reputable academic institutions and NGOs to lure victims into replying with expert insights and commentary about foreign policies related to North Korea. Emerald Sleet overlaps with threat actors tracked by other researchers as Kimsuky and Velvet Chollima.

Emerald Sleet’s use of LLMs has been in support of this activity and involved research into think tanks and experts on North Korea, as well as the generation of content likely to be used in spear-phishing campaigns. Emerald Sleet also interacted with LLMs to understand publicly known vulnerabilities, to troubleshoot technical issues, and for assistance with using various web technologies. Based on these observations, we map and classify these TTPs using the following descriptions:

LLM-assisted vulnerability research: Interacting with LLMs to better understand publicly reported vulnerabilities, such as the CVE-2022-30190 Microsoft Support Diagnostic Tool (MSDT) vulnerability (known as “Follina”).
LLM-enhanced scripting techniques: Using LLMs for basic scripting tasks such as programmatically identifying certain user events on a system and seeking assistance with troubleshooting and understanding various web technologies.
LLM-supported social engineering: Using LLMs for assistance with the drafting and generation of content that would likely be for use in spear-phishing campaigns against individuals with regional expertise.
LLM-informed reconnaissance: Interacting with LLMs to identify think tanks, government organizations, or experts on North Korea that have a focus on defense issues or North Korea’s nuclear weapon’s program.

All accounts and assets associated with Emerald Sleet have been disabled.

Crimson Sandstorm

Crimson Sandstorm (CURIUM) is an Iranian threat actor assessed to be connected to the Islamic Revolutionary Guard Corps (IRGC). Active since at least 2017, Crimson Sandstorm has targeted multiple sectors, including defense, maritime shipping, transportation, healthcare, and technology. These operations have frequently relied on watering hole attacks and social engineering to deliver custom .NET malware. Prior research also identified custom Crimson Sandstorm malware using email-based command-and-control (C2) channels. Crimson Sandstorm overlaps with the threat actor tracked by other researchers as Tortoiseshell, Imperial Kitten, and Yellow Liderc.

The use of LLMs by Crimson Sandstorm has reflected the broader behaviors that the security community has observed from this threat actor. Interactions have involved requests for support around social engineering, assistance in troubleshooting errors, .NET development, and ways in which an attacker might evade detection when on a compromised machine. Based on these observations, we map and classify these TTPs using the following descriptions:

LLM-supported social engineering: Interacting with LLMs to generate various phishing emails, including one pretending to come from an international development agency and another attempting to lure prominent feminists to an attacker-built website on feminism.
LLM-enhanced scripting techniques: Using LLMs to generate code snippets that appear intended to support app and web development, interactions with remote servers, web scraping, executing tasks when users sign in, and sending information from a system via email.
LLM-enhanced anomaly detection evasion: Attempting to use LLMs for assistance in developing code to evade detection, to learn how to disable antivirus via registry or Windows policies, and to delete files in a directory after an application has been closed.

All accounts and assets associated with Crimson Sandstorm have been disabled.

Charcoal Typhoon

Charcoal Typhoon (CHROMIUM) is a Chinese state-affiliated threat actor with a broad operational scope. They are known for targeting sectors that include government, higher education, communications infrastructure, oil & gas, and information technology. Their activities have predominantly focused on entities within Taiwan, Thailand, Mongolia, Malaysia, France, and Nepal, with observed interests extending to institutions and individuals globally who oppose China’s policies. Charcoal Typhoon overlaps with the threat actor tracked by other researchers as Aquatic Panda, ControlX, RedHotel, and BRONZE UNIVERSITY.

In recent operations, Charcoal Typhoon has been observed interacting with LLMs in ways that suggest a limited exploration of how LLMs can augment their technical operations. This has consisted of using LLMs to support tooling development, scripting, understanding various commodity cybersecurity tools, and for generating content that could be used to social engineer targets. Based on these observations, we map and classify these TTPs using the following descriptions:

LLM-informed reconnaissance: Engaging LLMs to research and understand specific technologies, platforms, and vulnerabilities, indicative of preliminary information-gathering stages.
LLM-enhanced scripting techniques: Utilizing LLMs to generate and refine scripts, potentially to streamline and automate complex cyber tasks and operations.
LLM-supported social engineering: Leveraging LLMs for assistance with translations and communication, likely to establish connections or manipulate targets.
LLM-refined operational command techniques: Utilizing LLMs for advanced commands, deeper system access, and control representative of post-compromise behavior.

All associated accounts and assets of Charcoal Typhoon have been disabled, reaffirming our commitment to safeguarding against the misuse of AI technologies.

Salmon Typhoon

Salmon Typhoon (SODIUM) is a sophisticated Chinese state-affiliated threat actor with a history of targeting US defense contractors, government agencies, and entities within the cryptographic technology sector. This threat actor has demonstrated its capabilities through the deployment of malware, such as Win32/Wkysol, to maintain remote access to compromised systems. With over a decade of operations marked by intermittent periods of dormancy and resurgence, Salmon Typhoon has recently shown renewed activity. Salmon Typhoon overlaps with the threat actor tracked by other researchers as APT4 and Maverick Panda.

Notably, Salmon Typhoon’s interactions with LLMs throughout 2023 appear exploratory and suggest that this threat actor is evaluating the effectiveness of LLMs in sourcing information on potentially sensitive topics, high profile individuals, regional geopolitics, US influence, and internal affairs. This tentative engagement with LLMs could reflect both a broadening of their intelligence-gathering toolkit and an experimental phase in assessing the capabilities of emerging technologies.

Based on these observations, we map and classify these TTPs using the following descriptions:

LLM-informed reconnaissance: Engaging LLMs for queries on a diverse array of subjects, such as global intelligence agencies, domestic concerns, notable individuals, cybersecurity matters, topics of strategic interest, and various threat actors. These interactions mirror the use of a search engine for public domain research.
LLM-enhanced scripting techniques: Using LLMs to identify and resolve coding errors. Requests for support in developing code with potential malicious intent were observed by Microsoft, and it was noted that the model adhered to established ethical guidelines, declining to provide such assistance.
LLM-refined operational command techniques: Demonstrating an interest in specific file types and concealment tactics within operating systems, indicative of an effort to refine operational command execution.
LLM-aided technical translation and explanation: Leveraging LLMs for the translation of computing terms and technical papers.

Salmon Typhoon’s engagement with LLMs aligns with patterns observed by Microsoft, reflecting traditional behaviors in a new technological arena. In response, all accounts and assets associated with Salmon Typhoon have been disabled.

In closing, AI technologies will continue to evolve and be studied by various threat actors. Microsoft will continue to track threat actors and malicious activity misusing LLMs, and work with OpenAI and other partners to share intelligence, improve protections for customers and aid the broader security community.

Appendix: LLM-themed TTPs

Using insights from our analysis above, as well as other potential misuse of AI, we’re sharing the below list of LLM-themed TTPs that we map and classify to the MITRE ATT&CK® framework or MITRE ATLAS™ knowledgebase to equip the community with a common taxonomy to collectively track malicious use of LLMs and create countermeasures against:

LLM-informed reconnaissance: Employing LLMs to gather actionable intelligence on technologies and potential vulnerabilities.
LLM-enhanced scripting techniques: Utilizing LLMs to generate or refine scripts that could be used in cyberattacks, or for basic scripting tasks such as programmatically identifying certain user events on a system and assistance with troubleshooting and understanding various web technologies.
LLM-aided development: Utilizing LLMs in the development lifecycle of tools and programs, including those with malicious intent, such as malware.
LLM-supported social engineering: Leveraging LLMs for assistance with translations and communication, likely to establish connections or manipulate targets.
LLM-assisted vulnerability research: Using LLMs to understand and identify potential vulnerabilities in software and systems, which could be targeted for exploitation.
LLM-optimized payload crafting: Using LLMs to assist in creating and refining payloads for deployment in cyberattacks.
LLM-enhanced anomaly detection evasion: Leveraging LLMs to develop methods that help malicious activities blend in with normal behavior or traffic to evade detection systems.
LLM-directed security feature bypass: Using LLMs to find ways to circumvent security features, such as two-factor authentication, CAPTCHA, or other access controls.
LLM-advised resource development: Using LLMs in tool development, tool modifications, and strategic operational planning.

Learn more

Read the sixth edition of Cyber Signals, spotlighting how we are protecting AI platforms from emerging threats related to nation-state cyberthreat actors: Navigating cyberthreats and strengthening defenses in the era of AI.

For the latest security research from the Microsoft Threat Intelligence community, check out the Microsoft Threat Intelligence Blog: https://aka.ms/threatintelblog.

The post Staying ahead of threat actors in the age of AI appeared first on Microsoft Security Blog.

AI threats | Latest Threats | Microsoft Security Blog

Microsoft shares latest intelligence on North Korean and Chinese threat actors at CYBERWARCON

Sapphire Sleet: Social engineering leading to cryptocurrency theft

Masquerading as a venture capitalist

Posing as recruiters

Ruby Sleet: Sophisticated phishing targeting satellite and weapons systems-related targets

North Korean IT workers: The triple threat

Facilitators complicate tracking of IT worker ecosystem

Fake profiles and portfolios with the aid of AI

Getting payment for remote work

Protecting organizations from North Korean IT workers

Storm-2077: No targets left behind

CYBERWARCON Recap

Learn more

Mitigating Skeleton Key, a new type of generative AI jailbreak technique

Introducing Skeleton Key

Attack flow

Mitigation and protection guidance

Building AI solutions on Azure

References

Learn more

AI jailbreaks: What they are and how they can be mitigated

What is AI jailbreak?

Why is generative AI susceptible to this issue?

What is the scope of the problem?

How do AI jailbreaks occur?

Mitigation and protection guidance

Detection guidance

Summary

Learn more

How Microsoft discovers and mitigates evolving attacks against AI guardrails

The potential for malicious manipulation of LLMs

Neutralizing poisoned content (Spotlighting)

Mitigating the risk of multiturn threats (Crescendo)

How Microsoft helps protect AI systems

The Crescendo Multi-Turn LLM Jailbreak Attack

Staying ahead of threat actors in the age of AI

A principled approach to detecting and blocking threat actors

Microsoft and OpenAI’s complementary defenses protect AI platforms

Summary of Microsoft and OpenAI’s findings and threat intelligence

Forest Blizzard

Emerald Sleet

Crimson Sandstorm

Charcoal Typhoon

Salmon Typhoon

Appendix: LLM-themed TTPs

Learn more