{"id":659616,"date":"2020-05-15T17:51:28","date_gmt":"2020-05-16T00:51:28","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=659616"},"modified":"2020-12-15T14:54:36","modified_gmt":"2020-12-15T22:54:36","slug":"bitvector-aware-query-optimization-for-decision-support-queries","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/bitvector-aware-query-optimization-for-decision-support-queries\/","title":{"rendered":"Bitvector-aware Query Optimization for Decision Support Queries"},"content":{"rendered":"
Bitvector filtering is an important query processing technique that can significantly reduce the cost of execution, especially for complex decision support queries with multiple joins. Despite its wide application, however, its implication to query optimization is not well understood.<\/p>\n
In this work, we study how bitvector filters impact query optimization. We show that incorporating bitvector filters into query optimization straightforwardly can increase the plan space complexity by an exponential factor in the number of relations in the query. We analyze the plans with bitvector filters for star and snowflake queries in the plan space of right deep trees without cross products. Surprisingly, with some simplifying assumptions, we prove that, the plan of the minimal cost with bitvector filters can be found from a linear number of plans in the number of relations in the query. This greatly reduces the plan space complexity for such queries from exponential to linear.<\/p>\n
Motivated by our analysis, we propose an algorithm that accounts for the impact of bitvector filters in query optimization. Our algorithm optimizes the join order for an arbitrary decision support query by choosing from a linear number of candidate plans in the number of relations in the query. We implement our algorithm in Microsoft SQL Server as a transformation rule. Our evaluation on both industry standard benchmarks and customer workload shows that, compared with the original Microsoft SQL Server, our technique reduces the total CPU execution time by 22%-64% for the workloads, with up to two orders of magnitude reduction in CPU execution time for individual queries.<\/p>\n