{"id":80141,"date":"2018-02-14T06:43:26","date_gmt":"2018-02-14T14:43:26","guid":{"rendered":"https:\/\/cloudblogs.microsoft.com\/microsoftsecure\/?p=80141"},"modified":"2025-02-19T21:13:46","modified_gmt":"2025-02-20T05:13:46","slug":"how-artificial-intelligence-stopped-an-emotet-outbreak","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2018\/02\/14\/how-artificial-intelligence-stopped-an-emotet-outbreak\/","title":{"rendered":"How artificial intelligence stopped an Emotet outbreak"},"content":{"rendered":"\n

At 12:46 a.m. local time on February 3, a Windows 7 Pro customer in North Carolina became the first would-be victim of a new malware attack campaign for Trojan:Win32\/Emotet<\/a>. In the next 30 minutes, the campaign tried to attack over a thousand potential victims, all of whom were instantly and automatically protected by Windows Defender AV<\/a>.<\/p>\n\n\n\n

How did Windows Defender AV uncover the newly launched attack and block it at the outset? Through layered machine learning<\/a>, including use of both client-side and cloud machine learning (ML) models. Every day, artificial intelligence enables Windows Defender AV to stop countless malware outbreaks in their tracks. In this blog post, we\u2019ll take a detailed look at how the combination of client and cloud ML models detects new outbreaks.<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

Figure 1. Layered detected model in Windows Defender AV<\/em><\/p>\n\n\n\n

Client machine learning models<\/h2>\n\n\n\n

The first layer of machine learning protection is an array of lightweight ML models built right into the Windows Defender AV client that runs locally on your computer. Many of these models are specialized for file types commonly abused by malware authors, including, JavaScript, Visual Basic Script, and Office macro. Some models target behavior detection, while other models are aimed at detecting portable executable (PE) files (.exe and .dll).<\/p>\n\n\n\n

In the case of the Emotet<\/a> outbreak on February 3, Windows Defender AV caught the attack using one of the PE gradient boosted tree ensemble models. This model classifies files based on a featurization of the assembly opcode sequence as the file is emulated, allowing the model to look at the file\u2019s behavior as it was simulated to run.<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

Figure 2. A client ML model classified the Emotet outbreak as malicious based on emulated execution opcode machine learning model.<\/em><\/p>\n\n\n\n

The tree ensemble was trained using LightGBM<\/a>, a Microsoft open-source framework used for high-performance gradient boosting.<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

Figure 3a. Visualization of the LightBGM-trained client ML model that successfully classified Emotet’s emulation behavior as malicious. A set of 20 decision trees are combined in this model to classify whether the file\u2019s emulated behavior sequence is malicious or not.<\/em><\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

Figure 3b. A more detailed look at the first decision tree in the model. Each decision is based on the value of a different feature. Green triangles indicate weighted-clean decision result; red triangles indicate weighted malware decision result for the tree.<\/em><\/p>\n\n\n\n

When the client-based machine learning model predicts a high probability of maliciousness, a rich set of feature vectors is then prepared to describe the content. These feature vectors include:<\/p>\n\n\n\n