Multi-Level Composite Stochastic Optimization via Nested Variance Reduction
- Junyu Zhang ,
- Lin Xiao
arXiv preprint
We consider multi-level composite optimization problems where each mapping in the composition is the expectation over a family of random smooth mappings or the sum of some finite number of smooth mappings. We present a normalized proximal approximate gradient (NPAG) method where the approximate gradients are obtained via nested stochastic variance reduction. In order to find an approximate stationary point where the expected norm of its gradient mapping is less than ϵ, the total sample complexity of our method is $O(\epsilon^{-3})$ in the expectation case, and O(N+\sqrt{N}\epsilon^{-2}) in the finite-sum case where N is the total number of functions across all composition levels. In addition, the dependence of our total sample complexity on the number of composition levels is polynomial, rather than exponential as in previous work.