{"id":723736,"date":"2021-02-05T17:31:39","date_gmt":"2021-02-06T01:31:39","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=723736"},"modified":"2021-04-02T15:44:27","modified_gmt":"2021-04-02T22:44:27","slug":"sparkcruise-handsfree-computation-reuse-in-spark","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/sparkcruise-handsfree-computation-reuse-in-spark\/","title":{"rendered":"SparkCruise: handsfree computation reuse in Spark"},"content":{"rendered":"
Interactive data analytics is often inundated with common computations across multiple queries. These redundancies result in poor query performance and higher overall cost for the interactive query sessions. Obviously, reusing these common computations could lead to cost savings. However, it is difficult for the users to manually detect and reuse the common computations in their fast moving interactive sessions. In the paper, we propose to demonstrate SparkCruise, a computation reuse system that automatically selects the most useful common computations to materialize based on the past query workload. SparkCruise materializes these computations as part of query processing, so the users can continue with their query processing just as before and computation reuse is automatically applied in the background \u2014 all without any modifications to the Spark code. We will invite the audience to play with several scenarios, such as workload redundancy insights and pay-as-you-go materialization, highlighting the utility of SparkCruise.<\/p>\n","protected":false},"excerpt":{"rendered":"
Interactive data analytics is often inundated with common computations across multiple queries. These redundancies result in poor query performance and higher overall cost for the interactive query sessions. Obviously, reusing these common computations could lead to cost savings. However, it is difficult for the users to manually detect and reuse the common computations in their […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[251743,246739,246691,251740,246781],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-723736","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-field-of-study-computation-reuse","msr-field-of-study-computer-hardware","msr-field-of-study-computer-science","msr-field-of-study-handsfree","msr-field-of-study-spark-mathematics"],"msr_publishername":"VLDB Endowment","msr_edition":"","msr_affiliation":"","msr_published_date":"2019-7-31","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"doi","viewUrl":"false","id":"false","title":"10.14778\/3352063.3352082","label_id":"243106","label":0},{"type":"url","viewUrl":"false","id":"false","title":"http:\/\/www.vldb.org\/pvldb\/vol12\/p1850-roy.pdf","label_id":"243132","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/dblp.uni-trier.de\/db\/journals\/pvldb\/pvldb12.html#RoyJPGKC19","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Abhishek Roy","user_id":40033,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Abhishek Roy"},{"type":"user_nicename","value":"Alekh Jindal","user_id":37419,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Alekh Jindal"},{"type":"text","value":"Hiren Patel","user_id":0,"rest_url":false},{"type":"text","value":"Ashit Gosalia","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Subramaniam Venkatraman Krishnan","user_id":33746,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Subramaniam Venkatraman Krishnan"},{"type":"user_nicename","value":"Carlo Curino","user_id":31352,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Carlo Curino"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[684024],"msr_project":[723529,595033],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":723529,"post_title":"Peregrine","post_name":"peregrine","post_type":"msr-project","post_date":"2021-02-05 16:07:39","post_modified":"2021-02-05 18:32:41","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/peregrine\/","post_excerpt":"Database administrators (DBAs) were traditionally responsible for optimizing the on-premise database workloads. However, with the rise of cloud data services where cloud providers offer fully managed data processing capabilities, the role of a DBA is completely missing. At the same time, workload optimization becomes even more important for reducing the total costs of operation and making data processing economically viable in the cloud. This project revisits workload optimization in the context of these emerging cloud-based…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/723529"}]}},{"ID":595033,"post_title":"CloudViews","post_name":"cloudviews","post_type":"msr-project","post_date":"2019-06-21 20:17:53","post_modified":"2021-02-05 18:33:29","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/cloudviews\/","post_excerpt":"Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for enterprise data analytics. These services are motivated by the fact that setting up and running data analytics is a major hurdle for enterprises. Although platform as a service (PaaS), software as a service (SaaS), and more recently database as a service (DBaaS) have eased the pain of provisioning and scaling hardware and software infrastructures, users are still responsible for managing and tuning their…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/595033"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723736"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723736\/revisions"}],"predecessor-version":[{"id":723739,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723736\/revisions\/723739"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=723736"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=723736"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=723736"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=723736"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=723736"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=723736"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=723736"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=723736"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=723736"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=723736"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=723736"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=723736"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=723736"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=723736"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=723736"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=723736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}