Learning Deep ResNet Blocks Sequentially using Boosting Theory
- Furon Huang ,
- Jordan Ash ,
- John Langford ,
- Robert E. Schapire
Deep neural networks are known to be difficult to train due to the instability of back-propagation. A deep \emph{residual network} (ResNet) with identity loops remedies this by stabilizing gradient computations. We prove a boosting theory for the ResNet architecture. We construct T weak module classifiers, each contains two of the T layers, such that the combined strong learner is a ResNet. Therefore, we introduce an alternative Deep ResNet training algorithm, \emph{BoostResNet}, which is particularly suitable in non-differentiable architectures. Our proposed algorithm merely requires a sequential training of T “shallow ResNets” which are inexpensive. We prove that the training error decays exponentially with the depth T if the \emph{weak module classifiers} that we train perform slightly better than some weak baseline. In other words, we propose a weak learning condition and prove a boosting theory for ResNet under the weak learning condition. Our results apply to general multi-class ResNets. A generalization error bound based on margin theory is proved and suggests ResNet’s resistant to overfitting under network with l1 norm bounded weights.