Over-parameterized Model Optimization with Polyak-{\L}ojasiewicz Condition
- Yixuan Chen ,
- Yubin Shi ,
- Mingzhi Dong ,
- Xiaochen Yang ,
- Dongsheng Li ,
- Yujiang Wang ,
- Robert Dick ,
- Qin Lv ,
- Yingying Zhao ,
- Fan Yang ,
- Ning Gu ,
- Li Shang
ICLR 23 |
This work pursues the optimization of over-parameterized deep models for superior training efficiency and test performance. We first theoretically emphasize the importance of two properties of over-parameterized models, i.e., the convergence gap and the generalization gap. Subsequent analyses unveil that these two gaps can be upper-bounded by the ratio of the Lipschitz constant and the Polyak-{\L}ojasiewicz (PL) constant, a crucial term abbreviated as the \emph{condition number}. Such discoveries have led to a structured pruning method with a novel pruning criterion. That is, we devise a gating network that dynamically detects and masks out those poorly-behaved nodes of a deep model during the training session. To this end, this gating network is learned via minimizing the \emph{condition number} of the target model, and this process can be implemented as an extra regularization loss term. Experimental studies demonstrate that the proposed method outperforms the baselines in terms of both training efficiency and test performance, exhibiting the potential of generalizing to a variety of deep network architectures and tasks.