Selective Pre-training for Private Fine-tuning

Da Yu; Sivakanth Gopi; Janardhan (Jana) Kulkarni; Zinan Lin; Saurabh Naik; Tomasz Lukasz Religa; Jian Yin; Huishuai Zhang

Selective Pre-training for Private Fine-tuning

Da Yu ,
Sivakanth Gopi ,
Janardhan (Jana) Kulkarni ,
Zinan Lin ,
Saurabh Naik ,
Tomasz Lukasz Religa ,
Jian Yin ,
Huishuai Zhang

TMLR | May 2024

Download BibTex

Suppose we want to train text prediction models in email clients or word processors. The models must preserve the privacy of user data and adhere to a specific fixed size to meet memory and inference time requirements. We introduce a generic framework to solve this problem. Specifically, we are given a public dataset D_pub and a private dataset D_priv corresponding to a downstream task T. How should we pre-train a fixed-size model M on D_pub and fine-tune it on D_priv such that performance of M with respect to T is maximized and M satisfies differential privacy with respect to D_priv? We show that pre-training on a subset of dataset D_pub that brings the public distribution closer to the private distribution is a crucial ingredient to maximize the transfer learning abilities of M after pre-training, especially in the regimes where model sizes are relatively small. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, smaller models can match the performance of much larger models, highlighting the promise of differentially private training as a tool for model compression and efficiency.