{"id":723658,"date":"2021-02-05T17:15:30","date_gmt":"2021-02-06T01:15:30","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=723658"},"modified":"2021-03-12T14:10:37","modified_gmt":"2021-03-12T22:10:37","slug":"computation-reuse-in-analytics-job-service-at-microsoft","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/computation-reuse-in-analytics-job-service-at-microsoft\/","title":{"rendered":"Computation Reuse in Analytics Job Service at Microsoft"},"content":{"rendered":"
Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for data analytics, be it in a cloud environment or within enterprises. In this setting, users are not required to manage or tune their hardware and software infrastructure, and they pay only for the processing resources consumed per job. However, the shared nature of these job services across several users and teams leads to significant overlaps in partial computations, i.e., parts of the processing are duplicated across multiple jobs, thus generating redundant costs. In this paper, we describe a computation reuse framework, coined CLOUDVIEWS, which we built to address the computation overlap problem in Microsoft’s SCOPE job service. We present a detailed analysis from our production workloads to motivate the computation overlap problem and the possible gains from computation reuse. The key aspects of our system are the following: (i) we reuse computations by creating materialized views over recurring workloads, i.e., periodically executing jobs that have the same script templates but process new data each time, (ii) we select the views to materialize using a feedback loop that reconciles the compile-time and run-time statistics and gathers precise measures of the utility and cost of each overlapping computation, and (iii) we create materialized views in an online fashion, without requiring an offline phase to materialize the overlapping computations.<\/p>\n","protected":false},"excerpt":{"rendered":"
Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for data analytics, be it in a cloud environment or within enterprises. In this setting, users are not required to manage or tune their hardware and software infrastructure, and they pay only for the processing resources consumed per job. However, the shared nature of […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563,13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246784,247555,250378,246691,247549,251569,251605,251608,248326],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-723658","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-research-area-systems-and-networking","msr-locale-en_us","msr-field-of-study-analytics","msr-field-of-study-cloud-computing","msr-field-of-study-computation","msr-field-of-study-computer-science","msr-field-of-study-data-analysis","msr-field-of-study-feedback-loop","msr-field-of-study-materialized-view","msr-field-of-study-reuse","msr-field-of-study-software"],"msr_publishername":"ACM","msr_edition":"","msr_affiliation":"","msr_published_date":"2018-5-26","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"doi","viewUrl":"false","id":"false","title":"10.1145\/3183713.3190656","label_id":"243106","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/03\/cloudviews-sigmod2018.pdf","label_id":"243132","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Alekh Jindal","user_id":37419,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Alekh Jindal"},{"type":"text","value":"Shi Qiao","user_id":0,"rest_url":false},{"type":"text","value":"Hiren Patel","user_id":0,"rest_url":false},{"type":"text","value":"Zhicheng Yin","user_id":0,"rest_url":false},{"type":"text","value":"Jieming Di","user_id":0,"rest_url":false},{"type":"text","value":"Malay Bag","user_id":0,"rest_url":false},{"type":"text","value":"Marc Friedman","user_id":0,"rest_url":false},{"type":"text","value":"Yifung Lin","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Konstantinos Karanasos","user_id":32565,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Konstantinos Karanasos"},{"type":"user_nicename","value":"Sriram Rao","user_id":33712,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sriram Rao"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[684024],"msr_project":[723529,474279,595033],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":723529,"post_title":"Peregrine","post_name":"peregrine","post_type":"msr-project","post_date":"2021-02-05 16:07:39","post_modified":"2021-02-05 18:32:41","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/peregrine\/","post_excerpt":"Database administrators (DBAs) were traditionally responsible for optimizing the on-premise database workloads. However, with the rise of cloud data services where cloud providers offer fully managed data processing capabilities, the role of a DBA is completely missing. At the same time, workload optimization becomes even more important for reducing the total costs of operation and making data processing economically viable in the cloud. This project revisits workload optimization in the context of these emerging cloud-based…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/723529"}]}},{"ID":474279,"post_title":"CloudViews: Materialized Views for Big Data Workloads","post_name":"cloudviews-materialized-views-big-data-workloads","post_type":"msr-project","post_date":"2018-03-16 15:02:38","post_modified":"2018-03-16 15:21:41","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/cloudviews-materialized-views-big-data-workloads\/","post_excerpt":"Analytics workloads in Microsoft's clusters are generated by users submitting SCOPE scripts, which are compiled down to jobs, which are then executed on the clusters. \u00a0An analysis of these workloads suggests\u00a0that there is significant overlap between the sub-graphs of the different jobs. \u00a0That is, the sub-graphs of different jobs compute the same result. \u00a0 This suggests that we can build upon view materialization to improve performance. \u00a0However, doing view materialization at a massive scale, where…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/474279"}]}},{"ID":595033,"post_title":"CloudViews","post_name":"cloudviews","post_type":"msr-project","post_date":"2019-06-21 20:17:53","post_modified":"2021-02-05 18:33:29","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/cloudviews\/","post_excerpt":"Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for enterprise data analytics. These services are motivated by the fact that setting up and running data analytics is a major hurdle for enterprises. Although platform as a service (PaaS), software as a service (SaaS), and more recently database as a service (DBaaS) have eased the pain of provisioning and scaling hardware and software infrastructures, users are still responsible for managing and tuning their…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/595033"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723658"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723658\/revisions"}],"predecessor-version":[{"id":723661,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/723658\/revisions\/723661"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=723658"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=723658"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=723658"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=723658"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=723658"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=723658"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=723658"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=723658"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=723658"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=723658"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=723658"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=723658"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=723658"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=723658"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=723658"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=723658"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}