dp-transformers

Training transformer models with differential privacy

Transformer models have recently taken the field of Natural Language Processing (NLP) by storm as large language models based on the transformer architecture have shown impressive performance across a wide range of applications. However, when investigating these models in terms of Responsible AI, a valid concern remains that privacy-preserving techniques must be properly applied when these models are trained with private data.

Differential Privacy (DP) has become a gold standard definition of privacy that offers rigorous privacy guarantees to individuals while enabling learning from a population. Among a vast set of applications, training machine learning models with DP in particular has the potential to extract great value from private data while protecting privacy of the participants.

Motivated by our recent work (opens in new tab), we are releasing a repository for training transformer models with differential privacy. Our repository, available at https://www.github.com/microsoft/dp-transformers (opens in new tab), is based on integrating the Opacus library (opens in new tab) to the Hugging Face (opens in new tab) platform. We aim to serve the privacy-preserving ML community in utilizing the state-of-the-art models while respecting the privacy of the individuals constituting what these models learn from.