Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing

  • Yongbo Yu ,
  • Fuxun Yu ,
  • Zirui Xu ,
  • Di Wang ,
  • Minjia Zhang ,
  • Ang Li ,
  • Shawn Bray ,
  • Chenchen Liu ,
  • Xiang Chen

Federated learning has been applied to train different tasks, posing new computation challenges in training, especially when the scenario becomes multi-task. In this paper, we profile the FL multi-task training process at the operator-level to identify and solve the problems in FL multi-task training. Second, we propose a Competitive GPU Resource Sharing method that can efficiently partition GPU resources to improve training efficiency. Third, for the imbalanced data problem in FL with multi-device training, we perform GPU resource partitioning according to the workload of different models. Experiments show that our method can obtain a 2.1 times speedup.