{"id":619941,"date":"2019-12-05T17:58:04","date_gmt":"2019-12-06T01:58:04","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=619941"},"modified":"2024-04-04T18:11:11","modified_gmt":"2024-04-05T01:11:11","slug":"pushing-data-induced-predicates-through-joins-in-big-data-clusters","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/pushing-data-induced-predicates-through-joins-in-big-data-clusters\/","title":{"rendered":"Pushing Data-Induced Predicates Through Joins in Big-Data Clusters"},"content":{"rendered":"

Using data statistics, we convert predicates on a table into data-induced predicates (diPs) that apply on the joining tables. Doing so substantially speeds up multi-relation queries because the benefits of predicate pushdown can now apply beyond just the tables that have predicates. We use diPs to skip data exclusively during query optimization; i.e., diPs lead to better plans and have no overhead during query execution. We study how to apply diPs for complex query expressions and how the usefulness of diPs varies with the data statistics used to construct diPs and the data distributions. Our results show that building diPs using zone-maps which are already maintained in today\u2019s clusters leads to sizable data skipping gains. Using a new (slightly larger) statistic, 50% of the queries in the TPC-H, TPC-DS and Join Order benchmarks can skip at least 33% of the query input. Consequently, the median query in a production big-data cluster finishes roughly 2\u00d7 faster.<\/p>\n","protected":false},"excerpt":{"rendered":"

Using data statistics, we convert predicates on a table into data-induced predicates (diPs) that apply on the joining tables. Doing so substantially speeds up multi-relation queries because the benefits of predicate pushdown can now apply beyond just the tables that have predicates. We use diPs to skip data exclusively during query optimization; i.e., diPs lead […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[246574],"research-area":[13563,13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-619941","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-highlight-award","msr-research-area-data-platform-analytics","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2020-8-31","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"Best Paper Runner-Up","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/p364-kandula.pdf","id":"619962","title":"p364-kandula","label_id":"243109","label":0}],"msr_related_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/dips_gsl.pptx","id":"1022661","title":"dips_gsl","label_id":"243115","label":0}],"msr_attachments":[{"id":1022667,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/04\/dips_gsl.pptx"},{"id":1022661,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/04\/dips_gsl.pptx"},{"id":619962,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/p364-kandula.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"Srikanth Kandula","user_id":33707,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Srikanth Kandula"},{"type":"text","value":"Laurel Orr","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Surajit Chaudhuri","user_id":33764,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Surajit Chaudhuri"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[957177,144899],"msr_project":[967236,239195],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":967236,"post_title":"Query Optimization for Database Systems","post_name":"query-optimization-for-database-systems","post_type":"msr-project","post_date":"2023-12-11 15:19:29","post_modified":"2023-12-11 15:19:32","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/query-optimization-for-database-systems\/","post_excerpt":"The query optimizer is a crucial component in a relational database system and is responsible for finding a good execution plan for a SQL query. For cloud database service providers, the importance of query optimization is amplified due to the scale (e.g., millions of databases hosted) and variety of different workloads for which the query optimizer is expected to work well \"out-of-the-box\". Query optimization is challenging due to the richness of SQL queries that contain…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967236"}]}},{"ID":239195,"post_title":"Quickr: Cost-Effective Data Analytics at Scale","post_name":"quickr-cost-effective-data-analytics-scale","post_type":"msr-project","post_date":"2020-02-28 15:32:39","post_modified":"2020-03-13 16:56:25","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/quickr-cost-effective-data-analytics-scale\/","post_excerpt":"Quickr explores cost-effective data analytics at scale, breaking new ground in research on offering approximate answers for complex ad-hoc queries.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/239195"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/619941"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/619941\/revisions"}],"predecessor-version":[{"id":1022670,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/619941\/revisions\/1022670"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=619941"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=619941"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=619941"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=619941"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=619941"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=619941"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=619941"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=619941"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=619941"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=619941"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=619941"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=619941"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=619941"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=619941"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=619941"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=619941"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}