{"id":700480,"date":"2020-10-23T04:30:53","date_gmt":"2020-10-23T11:30:53","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=700480"},"modified":"2022-07-11T15:55:41","modified_gmt":"2022-07-11T22:55:41","slug":"incorporating-super-operators-in-big-data-query-optimizers","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/incorporating-super-operators-in-big-data-query-optimizers\/","title":{"rendered":"Incorporating Super-Operators in Big-Data Query Optimizers"},"content":{"rendered":"
The cost of big-data analytics is dominated by shuffle operations that induce multiple disk reads, writes and network transfers. This paper proposes a new class of optimization rules that are specifically aimed at eliminating shuffles where possible. The rules substitute multiple shuffle inducing operators (Join, UnionAll, Spool, GroupBy) with a single streaming operator which implements an entire sub-query. We call such operators super-operators. A key challenge with adding new rules that substitute sub-queries with super-operators is that there are many variants of the same sub-query that can be implemented via minor modifications to the same super-operator. Adding each as a separate rule leads to a search space explosion. We propose several extensions to the query optimizer to address this challenge. We propose a new abstract representation for operator trees that captures all possible sub-queries that a super-operator implements. We propose a new rule matching algorithm that can efficiently search for abstract operator trees. Finally, we extend the physical operator interface to introduce new parametric super-operators.We implement our changes in SCOPE, a state-of-the-art production big-data optimizer used extensively at Microsoft. We demonstrate that the proposed optimizations provide significant reduction in both resource cost (average 1.7x) and latency (average 1.5x) on several production queries, and do so without increasing optimization time.<\/div>\n","protected":false},"excerpt":{"rendered":"

The cost of big-data analytics is dominated by shuffle operations that induce multiple disk reads, writes and network transfers. This paper proposes a new class of optimization rules that are specifically aimed at eliminating shuffles where possible. The rules substitute multiple shuffle inducing operators (Join, UnionAll, Spool, GroupBy) with a single streaming operator which implements […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563,13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-700480","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"VLDB","msr_edition":"","msr_affiliation":"","msr_published_date":"2020-9-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"http:\/\/www.vldb.org\/pvldb\/vol13\/p348-leeka.pdf","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Kaushik Rajan","user_id":32574,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kaushik Rajan"},{"type":"user_nicename","value":"Jyoti Leeka","user_id":40066,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jyoti Leeka"}],"msr_impact_theme":[],"msr_research_lab":[199562],"msr_event":[687654],"msr_group":[144939],"msr_project":[440907],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":440907,"post_title":"Optimizing Big-Data Queries using Program Reasoning","post_name":"optimizing-big-data-queries-using-program-reasoning","post_type":"msr-project","post_date":"2017-11-15 21:06:29","post_modified":"2021-12-14 21:23:33","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/optimizing-big-data-queries-using-program-reasoning\/","post_excerpt":"\u00a0This project is at the intersection of programming languages and database systems. The goal of the project is to use\u00a0programming languages techniques to analyze and optimize big-data queries. We show how program synthesis can be used to discover optimizations that big-data query optimizers miss today.\u00a0A big-data query optimizer produces an executable plan composed of map-reduce stages. We use program synthesis to produce plans with fewer stages than a query optimizer. A query optimizer has rules…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/440907"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/700480"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/700480\/revisions"}],"predecessor-version":[{"id":861063,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/700480\/revisions\/861063"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=700480"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=700480"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=700480"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=700480"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=700480"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=700480"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=700480"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=700480"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=700480"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=700480"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=700480"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=700480"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=700480"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=700480"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=700480"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=700480"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}