{"id":1128357,"date":"2025-02-10T14:48:43","date_gmt":"2025-02-10T22:48:43","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=1128357"},"modified":"2025-02-18T09:44:19","modified_gmt":"2025-02-18T17:44:19","slug":"tuna-tuning-unstable-and-noisy-cloud-applications","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/tuna-tuning-unstable-and-noisy-cloud-applications\/","title":{"rendered":"TUNA: Tuning Unstable and Noisy Cloud Applications"},"content":{"rendered":"

Autotuning<\/span> plays<\/span> a<\/span> pivotal<\/span> role<\/span> in<\/span> optimizing<\/span> the<\/span> perfor<\/span>mance of systems, particularly in large-scale cloud deploy<\/span>ments, and has been used to improve the performance of <\/span>a number of systems including databases, key-value stores, <\/span>and operating systems. We find that one of the main chal<\/span>lenges in performing autotuning in the cloud arises from <\/span>performance variability or noise in system measurements. <\/span>We first investigate the extent to which noise slows down <\/span>autotuning and find that as little as<\/span> 5%<\/span> noise can lead to a <\/span>2<\/span>.<\/span>5<\/span>x slowdown in converging to the best-performing con<\/span>figuration We also measure the magnitude of noise in cloud <\/span>computing settings and find that, while some components <\/span>(CPU, disk) have almost no performance variability there <\/span>are still sources of significant variability (caches, memory). <\/span>Additionally,<\/span> we<\/span> find<\/span> that<\/span> variability<\/span> leads<\/span> to<\/span> autotuning <\/span>finding<\/span> unstable<\/span> configurations, where for some workloads <\/span>as many as<\/span> 63<\/span>.<\/span>3%<\/span> of configurations selected as “best” during <\/span>tuning can degrade by<\/span> 30%<\/span> or more when deployed. Using <\/span>this<\/span> as<\/span> motivation,<\/span> this<\/span> paper<\/span> proposes<\/span> a<\/span> novel<\/span> approach <\/span>to improve the efficiency of autotuning systems by (a) de<\/span>tecting and removing outlier configurations, and (b) using <\/span>ML-based approaches to provide a more stable<\/span> true<\/span> signal <\/span>of<\/span> de-noised<\/span> experiment<\/span> results<\/span> to<\/span> the<\/span> optimizer.<\/span> The<\/span> re<\/span>sulting system, TUNA<\/em> (T<\/span><\/span>uning U<\/span><\/span>nstable and N<\/span><\/span>oisy Cloud <\/span>A<\/span><\/span>pplications) enables faster convergence and robust config<\/span>urations. We find that configurations learned using TUNA <\/span>perform better and with lower standard deviations during <\/span>deployment, as compared to traditional sampling methodolo<\/span>gies. Tuning<\/span> PostgreSQL<\/span> running an enterprise production <\/span>workload, we find that TUNA can lead to<\/span> 1<\/span>.<\/span>88<\/span>x lower run<\/span>ning time on average with<\/span> 2<\/span>.<\/span>58<\/span>\ud835\udc65<\/span> lower standard deviation <\/span>compared to traditional sampling methodologies. \u00a0<\/span>TUNA will be incorporated into the MLOS<\/a> project.<\/p>\n","protected":false},"excerpt":{"rendered":"

Autotuning plays a pivotal role in optimizing the performance of systems, particularly in large-scale cloud deployments, and has been used to improve the performance of a number of systems including databases, key-value stores, and operating systems. We find that one of the main challenges in performing autotuning in the cloud arises from performance variability or […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[269148,269142],"msr-field-of-study":[246691],"msr-conference":[267387],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1128357","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-include-in-river","msr-field-of-study-computer-science"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2025-3-30","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2025\/02\/TUNA.pdf","id":"1130118","title":"tuna","label_id":"243109","label":0}],"msr_related_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/mlos-in-action-bridging-the-gap-between-experimentation-and-auto-tuning-in-the-cloud\/","label_id":"243118","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/performance-roulette-how-cloud-weather-affects-ml-based-system-optimization\/","label_id":"243118","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/llamatune-sample-efficient-dbms-configuration-tuning\/","label_id":"243118","label":0}],"msr_attachments":[{"id":1130118,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2025\/02\/TUNA.pdf"},{"id":1128366,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2025\/02\/TUNA_EuroSys_2025_Submitted-1.pdf"}],"msr-author-ordering":[{"type":"text","value":"Johannes Freischuetz","user_id":0,"rest_url":false},{"type":"text","value":"Konstantinos Kanellis","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Brian Kroth","user_id":40024,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Brian Kroth"},{"type":"user_nicename","value":"Shivaram Venkataraman","user_id":37002,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shivaram Venkataraman"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[684024],"msr_project":[],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1128357","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1128357\/revisions"}],"predecessor-version":[{"id":1129122,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1128357\/revisions\/1129122"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1128357"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=1128357"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1128357"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1128357"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1128357"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=1128357"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1128357"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=1128357"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=1128357"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1128357"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1128357"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1128357"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1128357"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1128357"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1128357"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1128357"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}