{"id":747562,"date":"2021-05-21T02:19:56","date_gmt":"2021-05-21T09:19:56","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=747562"},"modified":"2022-07-11T16:00:28","modified_gmt":"2022-07-11T23:00:28","slug":"production-experiences-from-computation-reuse-at-microsoft","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/production-experiences-from-computation-reuse-at-microsoft\/","title":{"rendered":"Production Experiences from Computation Reuse at Microsoft"},"content":{"rendered":"
Massive data processing infrastructures are commonplace in modern data-driven enterprises. They facilitate data engineers in building scalable data pipelines over shared datasets. Unfortunately, data engineers often end up building pipelines that have portions of their computations common across other pipelines over the same set of shared datasets. Consolidating these data pipelines is therefore crucial for eliminating redundancies and improving production efficiency, thus saving significant operational costs. We had built CloudViews for automatic computation reuse in Cosmos big data workloads at Microsoft. CloudViews added a feedback loop in the SCOPE query engine to learn from past workloads and opportunistically materialize and reuse common computations as part of query processing in future SCOPE jobs \u2014 all completely automatic and transparent to the users.<\/p>\n
In this paper, we describe our production experiences with CloudViews. We first describe the data preparation process in Cosmos and show how computation reuse naturally augments this process. This is because computation reuse prepares data further into more shareable datasets that can improve the performance and efficiency of subsequent processing. We then discuss the usage and impact of CloudViews on our production clusters and describe many of the operational challenges that we have faced so far. Results from our current production deployment over a two-month window show that the cumulative latency of jobs improved by 34%, with a median improvement of 15%, and the total processing time reduced by 37%, indicating better customer experience and lower operational costs for these workloads.<\/p>\n","protected":false},"excerpt":{"rendered":"
Massive data processing infrastructures are commonplace in modern data-driven enterprises. They facilitate data engineers in building scalable data pipelines over shared datasets. Unfortunately, data engineers often end up building pipelines that have portions of their computations common across other pipelines over the same set of shared datasets. Consolidating these data pipelines is therefore crucial for […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[251743,246691,256114],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-747562","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-field-of-study-computation-reuse","msr-field-of-study-computer-science","msr-field-of-study-production-economics"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-3-23","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/edbt2021proceedings.github.io\/docs\/p165.pdf","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Alekh Jindal","user_id":37419,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Alekh Jindal"},{"type":"guest","value":"shi-qiao","user_id":595051,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=shi-qiao"},{"type":"guest","value":"hiren-patel","user_id":595048,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=hiren-patel"},{"type":"user_nicename","value":"Abhishek Roy","user_id":40033,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Abhishek Roy"},{"type":"user_nicename","value":"Jyoti Leeka","user_id":40066,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jyoti Leeka"},{"type":"user_nicename","value":"Brandon Haynes","user_id":40000,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Brandon Haynes"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[684024],"msr_project":[595033],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":595033,"post_title":"CloudViews","post_name":"cloudviews","post_type":"msr-project","post_date":"2019-06-21 20:17:53","post_modified":"2021-02-05 18:33:29","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/cloudviews\/","post_excerpt":"Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for enterprise data analytics. These services are motivated by the fact that setting up and running data analytics is a major hurdle for enterprises. Although platform as a service (PaaS), software as a service (SaaS), and more recently database as a service (DBaaS) have eased the pain of provisioning and scaling hardware and software infrastructures, users are still responsible for managing and tuning their…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/595033"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/747562"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/747562\/revisions"}],"predecessor-version":[{"id":861108,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/747562\/revisions\/861108"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=747562"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=747562"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=747562"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=747562"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=747562"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=747562"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=747562"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=747562"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=747562"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=747562"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=747562"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=747562"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=747562"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=747562"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=747562"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=747562"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}