{"id":723622,"date":"2021-02-05T17:01:13","date_gmt":"2021-02-06T01:01:13","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=723622"},"modified":"2021-02-05T17:11:15","modified_gmt":"2021-02-06T01:11:15","slug":"towards-a-learning-optimizer-for-shared-clouds","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/towards-a-learning-optimizer-for-shared-clouds\/","title":{"rendered":"Towards a learning optimizer for shared clouds"},"content":{"rendered":"
Query optimizers are notorious for inaccurate cost estimates, leading to poor performance. The root of the problem lies in inaccurate cardinality estimates, i.e., the size of intermediate (and final) results in a query plan. These estimates also determine the resources consumed in modern shared cloud infrastructures. In this paper, we present CARDLEARNER, a machine learning based approach to learn cardinality models from previous job executions and use them to predict the cardinalities in future jobs. The key intuition in our approach is that shared cloud workloads are often recurring and overlapping in nature, and so we could learn cardinality models for overlapping subgraph templates. We discuss various learning approaches and show how learning a large number of smaller models results in high accuracy and explainability. We further present an exploration technique to avoid learning bias by considering alternate join orders and learning cardinality models over them. We describe the feedback loop to apply the learned models back to future job executions. Finally, we show a detailed evaluation of our models (up to 5 orders of magnitude less error), query plans (60% applicability), performance (up to 100% faster, 3x fewer resources), and exploration (optimal in few 10s of executions).<\/p>\n","protected":false},"excerpt":{"rendered":"
Query optimizers are notorious for inaccurate cost estimates, leading to poor performance. The root of the problem lies in inaccurate cardinality estimates, i.e., the size of intermediate (and final) results in a query plan. These estimates also determine the resources consumed in modern shared cloud infrastructures. In this paper, we present CARDLEARNER, a machine learning […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13563],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[251560,247555,246691,251566,251569,251572,251563,248341],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-723622","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-field-of-study-cardinality","msr-field-of-study-cloud-computing","msr-field-of-study-computer-science","msr-field-of-study-cost-estimate","msr-field-of-study-feedback-loop","msr-field-of-study-intuition","msr-field-of-study-query-plan","msr-field-of-study-theoretical-computer-science"],"msr_publishername":"VLDB Endowment","msr_edition":"","msr_affiliation":"","msr_published_date":"2018-10-31","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"doi","viewUrl":"false","id":"false","title":"10.14778\/3291264.3291267","label_id":"243106","label":0},{"type":"url","viewUrl":"false","id":"false","title":"http:\/\/www.vldb.org\/pvldb\/vol12\/p210-wu.pdf","label_id":"243132","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/dl.acm.org\/doi\/10.14778\/3291264.3291267","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Chenggang Wu","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Alekh Jindal","user_id":37419,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Alekh Jindal"},{"type":"text","value":"Saeed Amizadeh","user_id":0,"rest_url":false},{"type":"text","value":"Hiren Patel","user_id":0,"rest_url":false},{"type":"text","value":"Wangchao Le","user_id":0,"rest_url":false},{"type":"text","value":"Shi Qiao","user_id":0,"rest_url":false},{"type":"text","value":"Sriram Rao","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[684024],"msr_project":[723529,723517],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":723529,"post_title":"Peregrine","post_name":"peregrine","post_type":"msr-project","post_date":"2021-02-05 16:07:39","post_modified":"2021-02-05 18:32:41","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/peregrine\/","post_excerpt":"Database administrators (DBAs) were traditionally responsible for optimizing the on-premise database workloads. However, with the rise of cloud data services where cloud providers offer fully managed data processing capabilities, the role of a DBA is completely missing. At the same time, workload optimization becomes even more important for reducing the total costs of operation and making data processing economically viable in the cloud. This project revisits workload optimization in the context of these emerging cloud-based…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/723529"}]}},{"ID":723517,"post_title":"Learning Optimizer","post_name":"learning-optimizer","post_type":"msr-project","post_date":"2021-02-05 16:02:37","post_modified":"2021-02-05 18:34:15","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/learning-optimizer\/","post_excerpt":"We are witnessing the rise of declarative big data systems. Examples include Hive, Spark, and Flink in the open source, and BigQuery, BigSQL, and SCOPE among proprietary systems. These systems take the declarative user queries as input and use a (typically cost based) query optimizer to pick the physical execution plans for that input. While query optimization has been a pain even in traditional databases, big data systems make the problem harder due to: (i)…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/723517"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723622","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723622\/revisions"}],"predecessor-version":[{"id":723625,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723622\/revisions\/723625"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=723622"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=723622"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=723622"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=723622"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=723622"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=723622"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=723622"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=723622"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=723622"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=723622"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=723622"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=723622"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=723622"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=723622"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=723622"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=723622"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}