{"id":1022580,"date":"2024-05-13T09:00:00","date_gmt":"2024-05-13T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1022580"},"modified":"2024-05-14T14:20:22","modified_gmt":"2024-05-14T21:20:22","slug":"enhanced-autoscaling-with-vasim-vertical-autoscaling-simulator-toolkit","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/enhanced-autoscaling-with-vasim-vertical-autoscaling-simulator-toolkit\/","title":{"rendered":"Enhanced autoscaling with VASIM: Vertical Autoscaling Simulator Toolkit"},"content":{"rendered":"\n

This research was presented as a demonstration at the<\/strong><\/em> <\/em><\/strong>40th<\/sup> IEEE International Conference on Data Engineering<\/em><\/strong> (opens in new tab)<\/span><\/a> (ICDE 2024), one of the premier conferences on data and information engineering.<\/em><\/strong><\/p>\n\n\n\n

\"ICDE<\/figure>\n\n\n\n

Since the inception of cloud computing, autoscaling has been an essential technique for optimizing resources and performance. By dynamically adjusting the number of computing resources allocated to a service based on current demand, autoscaling ensures that the service can handle the load efficiently while optimizing costs. However, developing and fine-tuning autoscaling algorithms, which govern this process, present significant challenges. The complexity and cost associated with testing these algorithms can lead to inefficient resource management and impede the development of more effective autoscaling strategies.<\/p>\n\n\n\n

\n\t
\n\t\t
\n\t\t\t\t\t\tPublication<\/span>\n\t\t\tVASIM: Vertical Autoscaling Simulator Toolkit<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n

In our paper, \u201cVASIM: Vertical Autoscaling Simulator Toolkit<\/a>,\u201d presented at ICDE 2024, we introduce a tool designed to address the complexities involved in assessing autoscaling algorithms. While existing simulation tools cover a range of capabilities, such as energy efficiency and fault tolerance, VASIM stands out by evaluating the critical recommender component within the algorithm and suggesting optimal resource scaling actions based on usage data, balancing performance and cost. This enables developers to iterate more rapidly, enhancing algorithmic performance, and improving resource efficiency and cost savings.<\/p>\n\n\n\n

VASIM’s user-friendly interface simplifies the evaluation of autoscaling policies, as illustrated in Figure 1. First steps entail uploading historical data and defining autoscaling policies, including the algorithm and its parameters, shown in the left panel. The Simulation Run feature enables the modification of algorithm parameters, imported via a configuration file, and the execution of simulations based on the selected trace. A results screen displays the CPU limits determined by the selected policies as well as the actual CPU usage tailored to these limits. Additionally, VASIM provides fundamental metrics like throttling incidents, number of scaling operations, and amount of unused capacity, or slack<\/em>, for the current simulation.<\/p>\n\n\n\n

\"[On<\/a>
Figure 1. The VASIM user interface comprises a run simulation pane on the left and a results pane on the right.<\/figcaption><\/figure>\n\n\n\n

VASIM achieves several important goals:<\/p>\n\n\n\n

Resource efficiency and cost reduction<\/strong>. VASIM reduces costs by removing the need to test scaling operations in real-time, which would be resource intensive. This enables developers to adjust algorithms iteratively in a controlled, cost-efficient environment, accelerating development cycles. Because the tool allows users to upload CPU performance history and algorithm parameters, it delivers the results of scaling operations across the entire workload in minutes rather than hours. <\/p>\n\n\n\n

Multi-objective optimization<\/strong>. It\u2019s challenging to develop an autoscaling method that handles conflicting parameters. VASIM makes this easier by applying Pareto optimization techniques (opens in new tab)<\/span><\/a>, helping developers to find a balance among key metrics. Figure 2 depicts scatter plots for two metrics: average slack and average insufficient CPU. It also shows three optimization objectives: the optimal amount of slack, throttling, and number of scaling operations.<\/p>\n\n\n\n

\"[On
Figure 2. The 2D diagram on the left shows a scatter plot of tuning with Pareto points. The 3D graph on the right shows a scatter plot with the three objectives.<\/figcaption><\/figure>\n\n\n\n

Recommender algorithm testing<\/strong>. VASIM simplifies the process of testing and evaluating recommendation algorithms across diverse workloads. With all tuning jobs running in parallel, computation occurs more quickly, allowing users to efficiently adjust their recommender parameters as necessary. To assess the algorithm\u2019s generalizability, we ran VASIM against 11 available open cluster traces (opens in new tab)<\/span><\/a> for benchmarking and internal product workload traces. This enabled us to evaluate the algorithms\u2019 robustness across a variety of workload types, including cyclical, bursty, and monotonic variations, demonstrating their reliability across different scenarios.<\/p>\n\n\n\n

Versatility and <\/strong>a<\/strong>daptability<\/strong>. VASIM provides users with the flexibility to modify components, experiment with recommendation strategies, and evaluate the impact of changes in a controlled and customizable environment. Figure 3 shows the results of a simulation run on the same algorithm and historical performance data but with different parameters. This versatility ensures that infrastructure engineers can tailor the system to meet their needs, enhancing the overall effectiveness of their autoscaling strategies.<\/p>\n\n\n\n

\"These
Figure 3. These graphs show VASIM running an identical algorithm on the same historical data but with varying parameters, affecting slack, throttling, and the frequency of scaling events. The objective is to maintain a minimal gap between the peak and the lowest resource utilization levels\u2014the top of the bottom line and the bottom of the top line, respectively. The goal is also to reduce the space between the response lag indicated by the trailing edges to the left of the lines. Simultaneously, it\u2019s important to minimize the occurrence of scaling events to prevent disruptions in workload execution.<\/figcaption><\/figure>\n\n\n\n

Optimizing scalability and costs in Kubernetes environments<\/h2>\n\n\n\n

Our research on vertically autoscaling monolithic applications with a container-as-a-service algorithm<\/a> helped us to better understand the tradeoffs between cost and availability that different algorithm variations introduce. Because VASIM is similar to standard autoscaling architecture (as in the Kubernetes Vertical Pod Autoscaler (opens in new tab)<\/span><\/a> [VPA]) it allows us to test autoscaling algorithms for pods, applications, and virtual machine (VM) capacity. This is possible because these systems share similar components, including resource updaters, controllers, and recommenders. Despite differences in specific systems, their underlying architectures are sufficiently similar, enabling VASIM to effectively mimic them, as shown in Figure 4.<\/p>\n\n\n\n

 <\/h5>\n\n\n\n
\"The
Figure 4. VASIM architecture mimics the main components of general autoscaling architectures, allowing users to parametrize those modules to fit their specific needs.<\/figcaption><\/figure>\n\n\n\n
 <\/h5>\n\n\n\n

Implications and looking ahead<\/h2>\n\n\n\n

Looking forward, we plan to broaden the scope of VASIM’s support beyond just CPUs to include a wide range of resources, such as memory, disk I\/O, and network bandwidth. This expansion will provide future users with a comprehensive understanding of system performance and enable them to make more accurate decisions regarding system management and resource optimization. Additionally, a deeper understanding of system performance will help inform proactive optimization strategies focused on maximizing system efficiency and performance.<\/p>\n","protected":false},"excerpt":{"rendered":"

Autoscaling can optimize cloud resource usage and costs by adjusting to demand. VASIM shows that simplifying testing and refinement of autoscaling algorithms can enable rapid development and evaluation of more efficient & cost-effective autoscaling strategies. <\/p>\n","protected":false},"author":37583,"featured_media":1022622,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Anna Pavlenko","user_id":"40009"},{"type":"user_nicename","value":"Karla Saur","user_id":"39991"},{"type":"user_nicename","value":"Yiwen Zhu","user_id":"39438"},{"type":"user_nicename","value":"Brian Kroth","user_id":"40024"},{"type":"user_nicename","value":"Joyce Cahoon","user_id":"40012"},{"type":"user_nicename","value":"Jes\u00fas Camacho Rodr\u00edguez","user_id":"40693"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13563,13560],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1022580","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-data-platform-analytics","msr-research-area-programming-languages-software-engineering","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[684024],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Anna Pavlenko","user_id":40009,"display_name":"Anna Pavlenko","author_link":"Anna Pavlenko<\/a>","is_active":false,"last_first":"Pavlenko, Anna","people_section":0,"alias":"annapa"},{"type":"user_nicename","value":"Yiwen Zhu","user_id":39438,"display_name":"Yiwen Zhu","author_link":"Yiwen Zhu<\/a>","is_active":false,"last_first":"Zhu, Yiwen","people_section":0,"alias":"yiwzh"},{"type":"user_nicename","value":"Joyce Cahoon","user_id":40012,"display_name":"Joyce Cahoon","author_link":"Joyce Cahoon<\/a>","is_active":false,"last_first":"Cahoon, Joyce","people_section":0,"alias":"jcahoon"},{"type":"user_nicename","value":"Jes\u00fas Camacho Rodr\u00edguez","user_id":40693,"display_name":"Jes\u00fas Camacho Rodr\u00edguez","author_link":"Jes\u00fas Camacho Rodr\u00edguez<\/a>","is_active":false,"last_first":"Camacho Rodr\u00edguez, Jes\u00fas","people_section":0,"alias":"jesusca"}],"msr_type":"Post","featured_image_thumbnail":"\"ICDE","byline":"","formattedDate":"May 13, 2024","formattedExcerpt":"Autoscaling can optimize cloud resource usage and costs by adjusting to demand. VASIM shows that simplifying testing and refinement of autoscaling algorithms can enable rapid development and evaluation of more efficient & cost-effective autoscaling strategies.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1022580","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37583"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1022580"}],"version-history":[{"count":29,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1022580\/revisions"}],"predecessor-version":[{"id":1031517,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1022580\/revisions\/1031517"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1022622"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1022580"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1022580"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1022580"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1022580"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1022580"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1022580"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1022580"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1022580"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1022580"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1022580"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1022580"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}