paper<\/a> discusses how this works in more detail.<\/p>\n\n\n\nImplications and looking forward<\/h2>\n\n\n\n
Hyrax is the first fail-in-place system for cloud computing servers, paving the way for future improvements. One potential enhancement involves reconsidering the approach to memory regions with 20 GB\/sec memory bandwidth. Instead of using them only for small VMs, we could potentially allocate these regions to accommodate large data structures, such as by adding buffers for input-output devices that require more than 20 GB\/sec of bandwidth.<\/p>\n\n\n\n
Failing-in-place offers significant flexibility when it comes to repairs. For example, instead of conducting daily repair trips to individual servers scattered throughout a datacenter, we are exploring the concept of batching repairs, where technicians would visit a row of server racks once every few weeks to address issues across multiple servers simultaneously. By doing so, we can save valuable time and resources while creating new research avenues for optimizing repair schedules that intelligently balance capacity loss and repair efforts.<\/p>\n\n\n\n
Achieving sustainability goals demands collective efforts across society. In this context, we introduce fail-in-place as a research direction for both datacenter hardware and software systems, directly tied to water and carbon efficiency. Beyond refining the fail-in-place concept itself and exploring new server designs, this new paradigm also opens up new pathways for improving maintenance processes using an environmentally friendly approach.<\/p>\n","protected":false},"excerpt":{"rendered":"
Managing server failures at the scale of a cloud platform is challenging. The Hyrax fail-in-place approach reduces the need for immediate repairs and creates a path toward lowering water consumption and carbon emissions in cloud datacenters. <\/p>\n","protected":false},"author":42183,"featured_media":956691,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13547],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-956640","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[282170],"related-projects":[757045],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Daniel S. Berger","user_id":38892,"display_name":"Daniel S. Berger","author_link":"Daniel S. Berger<\/a>","is_active":false,"last_first":"Berger, Daniel S.","people_section":0,"alias":"daberg"},{"type":"guest","value":"marisa-you","user_id":"956952","display_name":"Marisa You","author_link":"Marisa You<\/a>","is_active":true,"last_first":"You, Marisa","people_section":0,"alias":"marisa-you"},{"type":"user_nicename","value":"Celine Irvene","user_id":40636,"display_name":"Celine Irvene","author_link":"Celine Irvene<\/a>","is_active":false,"last_first":"Irvene, Celine","people_section":0,"alias":"celineirvene"},{"type":"guest","value":"mark-jung","user_id":"956958","display_name":"Mark Jung","author_link":"Mark Jung<\/a>","is_active":true,"last_first":"Jung, Mark","people_section":0,"alias":"mark-jung"},{"type":"guest","value":"tyler-narmore","user_id":"956964","display_name":"Tyler Narmore","author_link":"Tyler Narmore<\/a>","is_active":true,"last_first":"Narmore, Tyler","people_section":0,"alias":"tyler-narmore"},{"type":"guest","value":"jacob-shapiro","user_id":"956970","display_name":"Jacob Shapiro","author_link":"Jacob Shapiro<\/a>","is_active":true,"last_first":"Shapiro, Jacob","people_section":0,"alias":"jacob-shapiro"},{"type":"user_nicename","value":"Luke Marshall","user_id":37386,"display_name":"Luke Marshall","author_link":"Luke Marshall<\/a>","is_active":false,"last_first":"Marshall, Luke","people_section":0,"alias":"lumarsha"},{"type":"guest","value":"savyasachi-samal","user_id":"956976","display_name":"Savyasachi Samal","author_link":"Savyasachi Samal<\/a>","is_active":true,"last_first":"Samal, Savyasachi","people_section":0,"alias":"savyasachi-samal"},{"type":"guest","value":"preetha-subbarayalu","user_id":"956982","display_name":"Preetha Subbarayalu","author_link":"Preetha Subbarayalu<\/a>","is_active":true,"last_first":"Subbarayalu, Preetha","people_section":0,"alias":"preetha-subbarayalu"},{"type":"guest","value":"ashish-raniwala","user_id":"956988","display_name":"Ashish Raniwala","author_link":"Ashish Raniwala<\/a>","is_active":true,"last_first":"Raniwala, Ashish","people_section":0,"alias":"ashish-raniwala"},{"type":"guest","value":"brijesh-warrier","user_id":"956994","display_name":"Brijesh Warrier","author_link":"Brijesh Warrier<\/a>","is_active":true,"last_first":"Warrier, Brijesh","people_section":0,"alias":"brijesh-warrier"},{"type":"user_nicename","value":"Ricardo Bianchini","user_id":33393,"display_name":"Ricardo Bianchini","author_link":"Ricardo Bianchini<\/a>","is_active":false,"last_first":"Bianchini, Ricardo","people_section":0,"alias":"ricardob"}],"msr_type":"Post","featured_image_thumbnail":"
","byline":"","formattedDate":"July 27, 2023","formattedExcerpt":"Managing server failures at the scale of a cloud platform is challenging. The Hyrax fail-in-place approach reduces the need for immediate repairs and creates a path toward lowering water consumption and carbon emissions in cloud datacenters.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/956640","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=956640"}],"version-history":[{"count":22,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/956640\/revisions"}],"predecessor-version":[{"id":956949,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/956640\/revisions\/956949"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/956691"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=956640"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=956640"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=956640"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=956640"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=956640"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=956640"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=956640"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=956640"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=956640"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=956640"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=956640"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}