The query optimizer is a crucial component in a relational database system and is responsible for finding a good execution plan for a SQL query. For cloud database service providers, the importance of query optimization is amplified due to the scale (e.g., millions of databases hosted) and variety of different workloads for which the query optimizer is expected to work well “out-of-the-box”. Query optimization is challenging due to the richness of SQL queries that contain operators such as joins, group-by, aggregation, and nested sub-queries, the limited data statistics available during query optimization, and the need to keep time and resources for query optimization small. We are interested in a variety of problems related to query optimization. We highlight below some examples of our recent work in query optimization.
We conducted an empirical study to quantify the impact of cardinality estimation on plan quality in a state-of-the-art query optimizer (in the Microsoft SQL Server database engine), built using the Volcano/Cascades framework. We have also explored the viability of using data-driven techniques (e.g., ML models) that exploit knowledge of the workload for problems such as parameteric query optimization and cardinality estimation. With the increasing focus in research and industry on leveraging data-driven techniques for query optimization, we feel it is important to evaluate such systems fairly by taking into consideration the costs of obtaining training data as well as robustness to changing workload characteristics. For this purpose, we developed and open-sourced the DSB benchmark.