{"id":573039,"date":"2019-03-12T17:02:33","date_gmt":"2019-03-13T00:02:33","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=573039"},"modified":"2023-06-06T15:49:16","modified_gmt":"2023-06-06T22:49:16","slug":"resource-central","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/resource-central\/","title":{"rendered":"Machine Learning for Systems and Tiered AIOps"},"content":{"rendered":"

In this project, we are investigating the use of Machine Learning (ML) for improving computer systems (vs mimicking human behavior) and, in particular, cloud platforms.\u00a0 As a first step in this direction, we built Resource Central<\/strong>, a general ML and prediction-serving system that we have deployed in all Azure Compute clusters world-wide.\u00a0 It trains ML models offline and uses them to produce predictions online.\u00a0 The predictions can be used by other Azure components to improve resource, performance, and availability management.\u00a0 For example, the server defragmentation engine and the VM scheduler are two of the platform components that already use predictions (e.g., VM lifetime, VM migration blackout\/brownout times) from Resource Central in production.\u00a0 \u00a0We have recently expanded the project’s scope to AIOps, and in particular the notion of Tiered AIOps.\u00a0 The goal is to create systems support for ML-driven management of cloud platforms and for non-expert manual intervention when the ML fails in any way.\u00a0 As part of the Tiered AIOps effort, we have been exploring the use of large language models.<\/p>\n

This project is a close collaboration between Azure and E+D.<\/p>\n","protected":false},"excerpt":{"rendered":"

Resource Central\u00a0is a general ML and prediction-serving system deployed in Azure Compute.\u00a0 It trains ML models offline and uses them to produce predictions online.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13547],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-573039","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-systems-and-networking","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2016-09-01","related-publications":[431325,672777,694536,702511,737704,813553,632832],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Daniel S. Berger","user_id":38892,"people_section":"Section name 0","alias":"daberg"},{"type":"user_nicename","display_name":"Ricardo Bianchini","user_id":33393,"people_section":"Section name 0","alias":"ricardob"},{"type":"user_nicename","display_name":"Rodrigo Fonseca","user_id":40429,"people_section":"Section name 0","alias":"rofons"},{"type":"user_nicename","display_name":"Alok Kumbhare","user_id":36086,"people_section":"Section name 0","alias":"Alok Kumbhare"},{"type":"user_nicename","display_name":"Pedro Las-Casas","user_id":40465,"people_section":"Section name 0","alias":"pedrobr"}],"msr_research_lab":[199565],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/573039"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":11,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/573039\/revisions"}],"predecessor-version":[{"id":946209,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/573039\/revisions\/946209"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=573039"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=573039"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=573039"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=573039"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=573039"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}