Towards Resource-Elastic Machine Learning
- Dhruv Mahajan ,
- Sundararajan Sellamanickam ,
- Markus Weimer ,
- Keerthi Selvaraj
In this article, we argue that resource elasticity is a key requirement for distributed machine learning. Not only do computational resources disappear without warning (e.g. due to machine failure), modern resource managers also re-negotiate the available resources while a job is running: Additional machines may have become available or already reserved ones have been re-assigned to other jobs. We show how to formalize this problem and present an initial approach for linear learners.