The last decade has witnessed a tremendous interest in large scale data processing, and consequently the rise of so called big data systems. Apart from handling the scale and complexity of big data, it is also critical to improve the resource efficiency and reduce operational costs in these systems. Interestingly, resource efficiency becomes an even harder problem with the new breed of so called serverless query processing, where users do not have to setup clusters. Instead, the cloud provider takes care of allocating resources on a per-query basis. However, this is very challenging because the relationship between the resources provided and the performance observed for a query is often non-intuitive and even domain experts would struggle to manually pick the right set of resources for a given query.
The goal of this project is to develop tools and techniques that can help optimize resources in modern cloud query engines.