{"id":810376,"date":"2022-01-12T10:05:06","date_gmt":"2022-01-12T18:05:06","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=810376"},"modified":"2022-01-12T10:05:38","modified_gmt":"2022-01-12T18:05:38","slug":"ezpc-increased-data-security-in-the-ai-model-validation-process","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/ezpc-increased-data-security-in-the-ai-model-validation-process\/","title":{"rendered":"EzPC: Increased data security in the AI model validation process"},"content":{"rendered":"\n
\"EzCc<\/figure>\n\n\n\n

From manufacturing and logistics to agriculture and transportation, the expansion of artificial intelligence (AI) in the last decade has revolutionized a multitude of industries\u2014examples include enhancing predictive analytics<\/a> on the manufacturing floor and making microclimate predictions<\/a> so that farmers can respond and save their crops in time. The adoption of AI is expected to accelerate in the coming years, underscoring the need for an efficient adoption process that preserves data privacy.<\/p>\n\n\n\n

Currently, organizations that want to adopt AI into their workflow go through the process of model validation, in which they test, or validate<\/em>, AI models from multiple vendors before selecting the one that best fits their needs. This is usually done with a test dataset that the organization provides. Unfortunately, the two options that are currently available for model validation are insufficient; both risk the exposure of data.<\/p>\n\n\n\n

One of these options entails the AI vendor sharing their model with the organization, which can then validate the model on its test dataset. However, by doing this, the AI vendor risks exposing its intellectual property, which it undoubtedly wants to protect. The second option, equally risky, involves the organization sharing its test dataset with the AI vendor. This is problematic on two fronts. First, it risks exposing a dataset with sensitive information. Additionally, there\u2019s the risk that the AI vendor will use the test dataset to train the AI model, thereby \u201cover-fitting\u201d the model to the test dataset to show credible results. To accurately assess how an AI model performs on a test dataset, it\u2019s critical that the model not be trained on it. Currently, these concerns are addressed by complex legal agreements, often taking several months to draft and execute, creating a substantial delay in the AI adoption process.<\/p>\n\n\n\n

The risk of data exposure and the need for legal agreements are compounded in the healthcare domain, where patient data\u2014which makes up the test dataset\u2014is incredibly sensitive, and there are strict privacy regulations with which both organizations must comply. Additionally, not only does the vendor\u2019s AI model contain proprietary intellectual property information, but it may also include sensitive patient information as part of the training data that was used to develop it. This makes for a challenging predicament. On one hand, healthcare organizations want to quickly adopt AI due to its enormous potential in such applications as understanding health risks in patients, predicting and diagnosing diseases, and developing personalized health intervention. On the other hand, there\u2019s a fast-growing list of AI vendors in the healthcare space to choose from (currently over 200), making the cumulative legal paperwork of AI validation daunting.<\/p>\n\n\n\n

EzPC: Easy Secure Multi-Party Computation<\/h2>\n\n\n\n

We\u2019re very interested in accelerating the AI model validation process while also ensuring dataset and model privacy. For this reason, we built Easy Secure Multi-party Computation (opens in new tab)<\/span><\/a> (EzPC). This open-source framework is the result of a collaboration among researchers with backgrounds in cryptography, programming languages, machine learning (ML), and security. At its core, EzPC is based on secure multiparty computation (opens in new tab)<\/span><\/a> (MPC)\u2014a suite of cryptographic protocols that enable multiple parties to collaboratively compute a function on their private data without revealing that data to one other or any other party. This functionality makes AI model validation an ideal use case for MPC.<\/p>\n\n\n\n

However, while MPC has been around for almost four decades, it\u2019s rarely deployed because building scalable and efficient MPC protocols requires deep cryptography expertise. Additionally, while MPC performs well when computing small or simple stand-alone functions, combining several different kinds of functions\u2014which is fundamental to ML applications\u2014is much harder and inefficient if done without a specialized skillset.<\/p>\n\n\n\n

\n\t