{"id":166828,"date":"2020-02-28T19:00:58","date_gmt":"2014-08-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/multi-resource-packing-for-cluster-schedulers\/"},"modified":"2021-12-07T14:45:58","modified_gmt":"2021-12-07T22:45:58","slug":"multi-resource-packing-for-cluster-schedulers","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/multi-resource-packing-for-cluster-schedulers\/","title":{"rendered":"Multi-resource Packing for Cluster Schedulers"},"content":{"rendered":"<div class=\"asset-content\">\n<p>Tasks in modern data-parallel clusters have highly diverse resource requirements along CPU, memory, disk and network. We present <em>Tetris<\/em>, a multi-resource cluster scheduler that <em>packs <\/em>tasks to machines based on their requirements of all resource types. Doing so avoids resource <em>fragmentation<\/em> as well as <em>over-allocation<\/em> of the resources that are not explicitly allocated, both of which are drawbacks of current schedulers. <em>Tetris <\/em>adapts heuristics for the multi-dimensional bin packing problem to the context of cluster schedulers wherein task arrivals and machine availability change in an online manner and wherein task&#8217;s resource needs change with time and with the machine that the task is placed at. In addition, <em>Tetris <\/em>improves average job completion time by preferentially serving jobs that have less remaining work. We observe that fair allocations do not offer the best performance and the above heuristics are compatible with a large class of fairness policies; hence, we show how to simultaneously achieve good performance and fairness. Trace-driven simulations and deployment of our Apache YARN prototype on a 250 node cluster show gains of over 30% in makespan and job completion time while achieving nearly perfect fairness.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tasks in modern data-parallel clusters have highly diverse resource requirements along CPU, memory, disk and network. We present Tetris, a multi-resource cluster scheduler that packs tasks to machines based on their requirements of all resource types. Doing so avoids resource fragmentation as well as over-allocation of the resources that are not explicitly allocated, both of [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-166828","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2014-8-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"204735","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/tetris_sigcomm14.pdf","id":"204735","title":"tetris_sigcomm14.pdf","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Robert Grandl","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ganesh Ananthanarayanan","user_id":31834,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ganesh Ananthanarayanan"},{"type":"user_nicename","value":"Srikanth Kandula","user_id":33707,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Srikanth Kandula"},{"type":"user_nicename","value":"Sriram Rao","user_id":33712,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sriram Rao"},{"type":"text","value":"Aditya Akella","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[144899,684024],"msr_project":[394250,239762],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":394250,"post_title":"Cluster Resource Management","post_name":"cluster-resource-management","post_type":"msr-project","post_date":"2017-06-28 11:47:21","post_modified":"2020-03-13 16:49:56","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/cluster-resource-management\/","post_excerpt":"We are focused on building a scale-out, predictable, resource management substrate for big-data workloads.\u00a0 To this end, we started with providing predictable allocation SLOs for jobs that have completion time requirements, and then focused on improving cluster efficiency. Using Apache Hadoop YARN as the base, we have built a scale-out fabric by composing the following projects: 1. Preemption (YARN-45): We added work-conserving preemption to YARN to improve cluster utilization. 2. Rayon (YARN-1051): We added a&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/394250"}]}},{"ID":239762,"post_title":"Cluster scheduling","post_name":"cluster-scheduling","post_type":"msr-project","post_date":"2020-02-28 19:03:12","post_modified":"2020-03-13 16:55:01","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/cluster-scheduling\/","post_excerpt":"We consider various scheduling problems that arise in large clusters including multi-resource packing and dependency-aware scheduling. Our solutions have analytical foundations and are used in Microsoft's data-parallel clusters. Some have also shipped with Yarn. Skim through the publications for more details. A closely related project, lead by our applied scientist partner team, is here.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/239762"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/166828","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/166828\/revisions"}],"predecessor-version":[{"id":386036,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/166828\/revisions\/386036"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=166828"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=166828"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=166828"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=166828"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=166828"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=166828"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=166828"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=166828"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=166828"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=166828"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=166828"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=166828"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=166828"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=166828"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=166828"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=166828"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}