{"id":761911,"date":"2021-11-09T07:59:00","date_gmt":"2021-11-09T15:59:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-group&p=761911"},"modified":"2024-06-20T05:20:13","modified_gmt":"2024-06-20T12:20:13","slug":"privacy-preserving-machine-learning-innovation","status":"publish","type":"msr-group","link":"https:\/\/www.microsoft.com\/en-us\/research\/group\/privacy-preserving-machine-learning-innovation\/","title":{"rendered":"Privacy Preserving Machine Learning Innovation"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"abstract\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Privacy Preserving Machine Learning Innovation<\/h1>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

A holistic approach to PPML<\/h2>\n\n\n\n
\"diagram\"<\/a><\/figure>\n\n\n\n

Recent research has shown that deploying ML models can, in some cases, implicate privacy in unexpected ways. For example, pretrained public language models that are fine-tuned on private data can be misused to recover private information<\/a>, and very large language models have been shown to memorize training examples<\/a>, potentially encoding personally identifying information (PII). Finally, inferring that a specific user was part of the training data can also impact privacy<\/a>. At Microsoft Research, we believe it\u2019s critical to apply multiple techniques to achieve privacy and confidentiality; no single method can address all aspects alone. This is why we developed the Privacy Preserving Machine Learning (PPML) initiative to preserve the privacy and confidentiality of customer information while enabling next-generation productivity scenarios. With PPML, we take a three-pronged approach: first, we work to understand the risks and requirements around privacy and confidentiality; next, we work to measure the risks; and finally, we work to mitigate the potential for breaches of privacy. We explain the details of this multi-faceted approach below as well as in this blo<\/a>g post<\/a>.<\/p>\n\n\n\n

Understand:<\/strong> We work to understand the risk of customer data leakage and potential privacy attacks in a way that helps determine confidentiality properties of ML pipelines. In addition, we believe it\u2019s critical to proactively align with policy makers. We take into account local and international laws and guidance regulating data privacy, such as the General Data Protection Regulation (opens in new tab)<\/span><\/a> (GDPR) and the EU\u2019s policy on trustworthy AI (opens in new tab)<\/span><\/a>. We then map these legal principles, our contractual obligations, and responsible AI principles to our technical requirements and develop tools to communicate with policy makers how we meet these requirements.<\/p>\n\n\n\n

Measure:<\/strong> Once we understand the risks to privacy and the requirements we must adhere to, we define metrics that can quantify the identified risks and track success towards mitigating them.<\/p>\n\n\n\n

Mitigate:<\/strong> We then develop and apply mitigation strategies, such as differential privacy (DP), described in more detail in this blog post<\/a>. After we apply mitigation strategies, we measure their success and use our findings to refine our PPML approach.<\/p>\n\n\n\n

Several different technologies and processes contribute to PPML, and we implement them for a number of different use cases, including threat modeling and preventing the leakage of training data. PPML strives to provide a holistic approach to unlock the full potential of customer data for intelligent features while honoring our commitment to privacy and confidentiality.<\/p>\n\n\n\n

\"Diagram<\/a><\/figure>\n\n\n\n
\n
\n

Confidential AI<\/a><\/h4>\n\n\n\n

Our goal is to make Azure the most trustworthy cloud platform for AI. The platform we envisage offers confidentiality and integrity against privileged attackers including attacks on the code, data and hardware supply chains, performance close to that offered by GPUs, and programmability of state-of-the-art ML frameworks.<\/p>\n<\/div>\n\n\n\n

\n

Privacy in AI (PAI)<\/a><\/h4>\n\n\n\n

The M365 Research Privacy in AI group explores questions related to user privacy and confidentiality in machine learning.  Our workstreams consider problems in modeling privacy threats, measuring privacy loss in AI systems, and mitigating identified risks, including applications of differential privacy, federated learning, secure multi-party computation, etc.<\/p>\n<\/div>\n<\/div>\n\n\n\n

\n
\n
\"chart,<\/a><\/figure>\n\n\n\n

Differential Privacy: Project Laplace<\/a><\/h4>\n\n\n\n

Differential Privacy (DP) is the gold standard of privacy protection, with a vast body of academic literature and a growing number of large-scale deployments across the industry and the government. In machine learning scenarios DP works through adding small amounts of statistical random noise during training, the purpose of which is to conceal contributions of individual parties. When DP is employed, a mathematical proof ensures that the final ML model learns only general trends in the data without acquiring information specific to individual parties. To expand the scope of scenarios where DP can be successfully applied we push the boundaries of the state of the art in DP training algorithms to address the issues of scalability, efficiency, and privacy\/utility trade-offs.<\/p>\n<\/div>\n\n\n\n

\n
\"FLUTE<\/a><\/figure>\n\n\n\n

Project FLUTE<\/a><\/h4>\n\n\n\n

The goal of FLUTE is to create technologies that allow model training on private data without central curation. We apply techniques from federated learning, differential privacy, and high-performance computing, to enable cross-silo model training with strong experimental results. We have released FLUTE as an open-source toolkit on github (opens in new tab)<\/span><\/a>.<\/p>\n<\/div>\n<\/div>\n\n\n\n

\n\t\n\t
\n\t\t
\n\t\t\t

TOOLS<\/p>\n\n\n\n

\n
\n

Privacy Random Variable (PRV) Accountant<\/h3>\n\n\n\n

A fast algorithm to optimally compose privacy guarantees of differentially private (DP) mechanisms to arbitrary accuracy.<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n\n\n\n
Download<\/a><\/div>\n<\/div>\n<\/div>\n\n\n\n
\n

DP-Transformers<\/h3>\n\n\n\n

Motivated by our recent work (opens in new tab)<\/span><\/a>, we are releasing a repository for training transformer models with differential privacy. Our GitHub (opens in new tab)<\/span><\/a> repository is based on integrating the Opacus library (opens in new tab)<\/span><\/a> to the Hugging Face (opens in new tab)<\/span><\/a> platform. We aim to serve the privacy-preserving ML community in utilizing the state-of-the-art models while respecting the privacy of the individuals constituting what these models learn from.<\/p>\n\n\n\n

\n
Download<\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n

Related research<\/h2>\n\n\n\n
\n

EzPC (Easy Secure Multi-party Computation)<\/a><\/h3>\n\n\n\n

The EzPC project focuses on providing a scalable, performant, and usable system for secure Multi-Party Computation (MPC). MPC, through cryptographic protocols, allows multiple parties with sensitive information to compute joint functions on their data without sharing the data in the clear with any entity. In the context of machine learning, an example of such a task is that of secure inference\u2014where a model owner can offer inference as a service to a data owner without either entity seeing any data in the clear. The EzPC system automatically generates MPC protocols for this task from standard TensorFlow\/ONNX code.<\/p>\n<\/div>

\"diagram<\/a><\/figure><\/div>\n\n\n","protected":false},"excerpt":{"rendered":"

Recent research has shown that deploying ML models can, in some cases, implicate privacy in unexpected ways. For example, pretrained public language models that are fine-tuned on private data can be misused to recover private information, and very large language models have been shown to memorize training examples, potentially encoding personally identifying information (PII). Finally, inferring that […]<\/p>\n","protected":false},"featured_media":793565,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_group_start":"","footnotes":""},"research-area":[13558],"msr-group-type":[243694],"msr-locale":[268875],"msr-impact-theme":[],"class_list":["post-761911","msr-group","type-msr-group","status-publish","has-post-thumbnail","hentry","msr-research-area-security-privacy-cryptography","msr-group-type-group","msr-locale-en_us"],"msr_group_start":"","msr_detailed_description":"","msr_further_details":"","msr_hero_images":[],"msr_research_lab":[199561,199562,199565,199560],"related-researchers":[{"type":"guest","display_name":"Jim Kleewein","user_id":786892,"people_section":"Executive sponsors","alias":""},{"type":"user_nicename","display_name":"Jaime Teevan","user_id":33975,"people_section":"Executive sponsors","alias":"teevan"},{"type":"user_nicename","display_name":"Saravan Rajmohan","user_id":41039,"people_section":"Leadership team","alias":"saravar"},{"type":"user_nicename","display_name":"Ryen W. White","user_id":33481,"people_section":"Leadership team","alias":"ryenw"},{"type":"user_nicename","display_name":"Manuel Costa","user_id":32794,"people_section":"Leadership team","alias":"manuelc"},{"type":"guest","display_name":"Morris Kabuage","user_id":786916,"people_section":"Leadership team","alias":""},{"type":"user_nicename","display_name":"Kieran McDonald","user_id":39021,"people_section":"Leadership team","alias":"kieranmc"},{"type":"user_nicename","display_name":"danah boyd","user_id":31651,"people_section":"Leadership team","alias":"dmb"},{"type":"user_nicename","display_name":"Victor Ruehle","user_id":41027,"people_section":"Team","alias":"virueh"},{"type":"user_nicename","display_name":"Kim Laine","user_id":32546,"people_section":"Team","alias":"kilai"},{"type":"user_nicename","display_name":"Melissa Chase","user_id":32878,"people_section":"Team","alias":"melissac"},{"type":"user_nicename","display_name":"Boris Köpf","user_id":37857,"people_section":"Team","alias":"bokoepf"},{"type":"guest","display_name":"Morris Kabuage","user_id":786916,"people_section":"Team","alias":""},{"type":"user_nicename","display_name":"Nishanth Chandran","user_id":33084,"people_section":"Team","alias":"nichandr"},{"type":"user_nicename","display_name":"Robert Sim","user_id":36650,"people_section":"Team","alias":"rsim"},{"type":"user_nicename","display_name":"Sergey Yekhanin","user_id":34990,"people_section":"Team","alias":"yekhanin"},{"type":"user_nicename","display_name":"Daniel Jones","user_id":41030,"people_section":"Team","alias":"jonesdaniel"},{"type":"user_nicename","display_name":"Lukas Wutschitz","user_id":38775,"people_section":"Team","alias":"luwutsch"},{"type":"user_nicename","display_name":"Shruti Tople","user_id":39003,"people_section":"Team","alias":"shtople"},{"type":"user_nicename","display_name":"Santiago Zanella-B\u00e9guelin","user_id":33518,"people_section":"Team","alias":"santiago"},{"type":"user_nicename","display_name":"Andrew Paverd","user_id":37902,"people_section":"Team","alias":"anpaverd"},{"type":"user_nicename","display_name":"Janardhan (Jana) Kulkarni","user_id":32147,"people_section":"Team","alias":"jakul"},{"type":"user_nicename","display_name":"Sivakanth Gopi","user_id":37830,"people_section":"Team","alias":"sigopi"},{"type":"user_nicename","display_name":"Arturs Backurs","user_id":40771,"people_section":"Team","alias":"abackurs"},{"type":"user_nicename","display_name":"Esha Ghosh","user_id":37851,"people_section":"Team","alias":"esghosh"},{"type":"user_nicename","display_name":"Huseyin Atahan Inan","user_id":40426,"people_section":"Team","alias":"huinan"},{"type":"user_nicename","display_name":"Sepideh Mahabadi","user_id":40780,"people_section":"Team","alias":"smahabadi"},{"type":"user_nicename","display_name":"Divya Gupta","user_id":37766,"people_section":"Team","alias":"digup"},{"type":"user_nicename","display_name":"Rahul Sharma","user_id":36308,"people_section":"Team","alias":"rahsha"},{"type":"user_nicename","display_name":"Aseem Rastogi","user_id":36021,"people_section":"Team","alias":"aseemr"},{"type":"user_nicename","display_name":"Kapil Vaswani","user_id":32487,"people_section":"Team","alias":"kapilv"},{"type":"user_nicename","display_name":"Antoine Delignat-Lavaud","user_id":31056,"people_section":"Team","alias":"antdl"},{"type":"user_nicename","display_name":"Stavros Volos","user_id":35437,"people_section":"Team","alias":"svolos"},{"type":"user_nicename","display_name":"C\u00e9dric Fournet","user_id":31819,"people_section":"Team","alias":"fournet"},{"type":"user_nicename","display_name":"Xuchao Zhang","user_id":42045,"people_section":"Team","alias":"xuchaozhang"},{"type":"user_nicename","display_name":"Molly Xia","user_id":41943,"people_section":"Team","alias":"mollyxia"},{"type":"user_nicename","display_name":"Zinan Lin","user_id":42327,"people_section":"Team","alias":"zinanlin"},{"type":"user_nicename","display_name":"Gbola Afonja","user_id":42846,"people_section":"Team","alias":"gafonja"},{"type":"user_nicename","display_name":"Giovanni Cherubin","user_id":41410,"people_section":"Team","alias":"gcherubin"}],"related-publications":[642618,790529,744307,915804,254093,762400,951615,644934,790535,747820,938655,438294,762406,951996,660741,823309,750094,939924,499550,762412,957018,672480,830107,753328,939993,507653,764191,975756,672636,837859,754795,940899,507662,765679,994119,685314,838957,754813,944880,567648,781984,1023756,691908,846901,756028,944886,567663,785137,1025313,710251,864054,760198,945099,611136,786553,1034898,720217,864876,762373,945279,639414,786691,1042731,723952,879069,762385,945306,641136,789491,1047069,727696,882387,168426,762394,950790],"related-downloads":[866760,758239],"related-videos":[],"related-projects":[],"related-events":[],"related-opportunities":[],"related-posts":[],"tab-content":[{"id":0,"name":"Research","content":"

Confidential AI<\/a><\/h3>\r\nOur goal is to make Azure the most trustworthy cloud platform for AI. The platform we envisage offers confidentiality and integrity against privileged attackers including attacks on the code, data and hardware supply chains, performance close to that offered by GPUs, and programmability of state-of-the-art ML frameworks. The confidential AI platform will enable multiple entities to collaborate and train accurate models using sensitive data, and serve these models with assurance that their data and models\u2026\r\n

Project FTL<\/a><\/h3>\r\nA novel framework for training models in a Federated Learning fashion. One of the novelties of the project is the first attempt to introduce Federated Learning in Speech Recognition tasks. Besides the novelty of the task, the paper describes an easily generalizable FL platform and some of the design decisions used for this task. Among the novel algorithms introduced are a new hierarchical optimization scheme, a gradient selection algorithm, and self-supervised training algorithms.\r\n

Real World Reinforcement Learning<\/a><\/h3>\r\nReal World Reinforcement Learning (Real-World RL) projects enable the next generation of machine learning using interactive reinforcement-based approaches to solve real-world problems.\r\n

EzPC (Easy Secure Multi-party Computation)<\/a><\/h3>\r\nConsider the following scenario: two hospitals, each having sensitive patient data, must compute statistical information about their joint data. Privacy regulations forbid them from sharing data in the clear with any entity. So, can they compute this information while keeping their private data encrypted (or \u201chidden\u201d) from each other? Cryptography and specifically, the primitive Secure Multi-Party Computation (MPC), provides an answer to this seemingly impossible task using sophisticated mathematical protocols. However, two big challenges remain:\u2026\r\n

Project SPIRAL<\/h3>\r\n

Project Florida<\/h3>\r\n

Confidential Computing<\/h3>\r\n

Privacy in ML<\/h3>\r\n

Differential Privacy<\/h3>\r\nDifferential privacy (DP) is widely recognized as a gold standard of privacy protection due to its mathematical rigor. Through the lens of differential privacy, we can design machine learning algorithms that can responsibly train models on private data. However, it is challenging to apply differentially private stochastic gradient descent (DP-SGD) on large deep neural network models because the dimensional dependence of DP: larger model usually leads to worse performance in order to guarantee a same level of differential privacy. This is an essential barrier of applying DP in deep learning era where state-of-the-art performance begs for large models."}],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/761911"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-group"}],"version-history":[{"count":78,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/761911\/revisions"}],"predecessor-version":[{"id":1048887,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/761911\/revisions\/1048887"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/793565"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=761911"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=761911"},{"taxonomy":"msr-group-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group-type?post=761911"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=761911"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=761911"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}