{"id":737704,"date":"2021-04-01T17:57:24","date_gmt":"2021-04-02T00:57:24","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=737704"},"modified":"2021-11-13T00:10:11","modified_gmt":"2021-11-13T08:10:11","slug":"smartharvest-harvesting-idle-cpus-safely-and-efficiently-in-the-cloud","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/smartharvest-harvesting-idle-cpus-safely-and-efficiently-in-the-cloud\/","title":{"rendered":"SmartHarvest: Harvesting Idle CPUs Safely and Efficiently in the Cloud"},"content":{"rendered":"
We can increase the efficiency of public cloud datacenters by harvesting allocated but temporarily idling CPU cores from customer virtual machines (VMs) to run batch or analytics workloads.\u00a0 Even small efficiency gains translate into substantial savings, since provisioning and operating a datacenter costs hundreds of millions of dollars per year.\u00a0 The main challenge is to harvest idle cores with little or no impact on customer VMs, which could be running latency-sensitive services and are essentially black-boxes to the cloud provider.<\/p>\n
We introduce ElasticVM, a new VM type that can run batch workloads cheaply using mainly harvested cores.\u00a0 We also propose SmartHarvest, a system that dynamically manages the number of cores available to ElasticVMs in each fine-grained time window.\u00a0 SmartHarvest uses online learning to predict the core demand of primary, customer VMs and compute the number of cores that can be safely harvested.\u00a0 Our results show that SmartHarvest can harvest a significant amount of CPU resources without increasing the 99th-percentile tail latency of latency-critical primary workloads by more than 10%.\u00a0 Unlike static harvesting techniques that rely on offline profiling, SmartHarvest is robust to different primary workloads, batch workloads, and load changes.\u00a0 Finally, we show that the online learning in SmartHarvest is complementary to systems optimizations for VM management.<\/p>\n
<\/p>\n","protected":false},"excerpt":{"rendered":"
We can increase the efficiency of public cloud datacenters by harvesting allocated but temporarily idling CPU cores from customer virtual machines (VMs) to run batch or analytics workloads.\u00a0 Even small efficiency gains translate into substantial savings, since provisioning and operating a datacenter costs hundreds of millions of dollars per year.\u00a0 The main challenge is to […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-737704","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-4-28","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/SmartHarvest-Eurosys21.pdf","id":"738082","title":"smartharvest-eurosys21","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":738082,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/SmartHarvest-Eurosys21.pdf"},{"id":737710,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/smartharvest-camera-ready.pdf"}],"msr-author-ordering":[{"type":"text","value":"Yawen Wang","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Kapil Arya","user_id":37824,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kapil Arya"},{"type":"text","value":"Marios Kogias","user_id":0,"rest_url":false},{"type":"text","value":"Manohar Vanga","user_id":0,"rest_url":false},{"type":"text","value":"Aditya Bhandari","user_id":0,"rest_url":false},{"type":"text","value":"Neeraja J. Yadwadkar","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Siddhartha Sen","user_id":33656,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Siddhartha Sen"},{"type":"user_nicename","value":"Sameh Elnikety","user_id":33503,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sameh Elnikety"},{"type":"text","value":"Christos Kozyrakis","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ricardo Bianchini","user_id":33393,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ricardo Bianchini"}],"msr_impact_theme":[],"msr_research_lab":[199565,199571],"msr_event":[],"msr_group":[144927,282170],"msr_project":[616008,573039],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":616008,"post_title":"LEAP: Lean, Efficient, And Predictable Cloud Platforms","post_name":"leap","post_type":"msr-project","post_date":"2019-10-17 15:35:49","post_modified":"2023-06-06 15:41:13","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/leap\/","post_excerpt":"No public cloud can host large latency-sensitive services, such as search engines, in a way that is economic for those services today! Project LEAP (short for Lean, Efficient, And Predictable) addresses the research challenges in enabling cloud platforms to host these services in a rightsized, elastic, and with predictable tail latency.\u00a0 The challenges include how to prevent interference from co-located workloads, how to prevent performance jitter due to I\/O, how to make the services elastic…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/616008"}]}},{"ID":573039,"post_title":"Machine Learning for Systems and Tiered AIOps","post_name":"resource-central","post_type":"msr-project","post_date":"2019-03-12 17:02:33","post_modified":"2023-06-06 15:49:16","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/resource-central\/","post_excerpt":"Resource Central\u00a0is a general ML and prediction-serving system deployed in Azure Compute.\u00a0 It trains ML models offline and uses them to produce predictions online.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/573039"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/737704"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/737704\/revisions"}],"predecessor-version":[{"id":738085,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/737704\/revisions\/738085"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=737704"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=737704"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=737704"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=737704"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=737704"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=737704"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=737704"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=737704"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=737704"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=737704"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=737704"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=737704"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=737704"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=737704"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=737704"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=737704"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}