{"id":982158,"date":"2023-11-08T12:39:02","date_gmt":"2023-11-08T20:39:02","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=982158"},"modified":"2024-08-30T10:59:53","modified_gmt":"2024-08-30T17:59:53","slug":"finding-adversarial-inputs-for-heuristics","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/finding-adversarial-inputs-for-heuristics\/","title":{"rendered":"MetaOpt: A Comprehensive Heuristic Analysis and Optimization Tool"},"content":{"rendered":"\n
\n
\n
\n\t
\n\t\t
\n\t\t\t\"MetaOpt\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

MetaOpt<\/h1>\n\n\n\n

Towards efficient heuristic design where we achieve quantifiable and confident performance<\/h2>\n\n\n\n

<\/p>\n\n\n\n

We use heuristics all the time across many systems including those that are critical to production services. Production systems use heuristics because they are faster or scale better than their optimal counterparts. But practitioners often don\u2019t know the performance gap between the heuristic and the optimal, or another heuristic in realistic scenarios. We present MetaOpt<\/strong>, a system that helps analyze heuristics.<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n\n\n

We use heuristics all the time across many systems, including those that are critical to production services. Production systems use heuristics because they are faster or scale better than their optimal counterparts. But practitioners often don\u2019t know the performance gap between the heuristic and the optimal, or another heuristic in realistic scenarios.<\/p>\n\n\n\n

In many cases, theoreticians also have been unable to prove tight bounds for such heuristics: the first fit decreasing heuristic for vector bin packing is one such example — theoreticians have studied it for over 30 years and have not yet found a tight bound that worked across all problem sizes<\/strong> (teaser: our tool, MetaOpt, enabled us to prove a tighter bound and design a constructive proof to do so).<\/p>\n\n\n\n

A heuristic analyzer not only helps practitioners understand the performance gaps for their heuristics, but also helps them find practical situations where their heuristics underperform and deploy workarounds to mitigate their impact. They can analyze the decisions their heuristics make that cause these performance gaps and modify them to prevent problematic cases. They can also show what happens to the performance gap if we combine multiple heuristics to derive a better one. But the use cases for such a tool don\u2019t stop here.<\/strong><\/p>\n\n\n\n

\n

MetaOpt pioneers an area of research on scalable, general, and easy-to-use<\/em><\/strong> tools that enable users to analyze and explain the performance differences across competing algorithms and to improve these algorithms before they are deployed in high-stakes situations.<\/p>\n<\/blockquote>\n\n\n\n

We present, MetaOpt, a system that helps us analyze heuristics we have in production. MetaOpt enables the use cases mentioned earlier and much more. We use it to analyze some of the popular machine learning models that we have in production at Microsoft; we have used it as a \u201ca helper for theorem proving\u201d<\/strong> to prove tighter performance bounds for heuristics in domains such as vector bin packing and packet scheduling; we have used it to analyze probabilistic heuristics as well; and we are now investigating how to use it for capacity planning in our WAN.<\/p>\n\n\n\n

MetaOpt pioneers an area of research on scalable, general, and easy-to-use<\/em> tools that enable users to analyze and explain the performance differences between competing algorithms and to help operators improve their heuristics before they deploy to production. While others have studied individual heuristics (e.g., congestion control and network traffic), none are general, provably optimal, or easy to use. MetaOpt is the first general-purpose<\/em> and scalable<\/em> tool that enables users to analyze a broad class of heuristics through easy-to-use abstractions<\/em> that apply to a broad range of practical heuristics. <\/p>\n\n\n\n

\n

We believe MetaOpt can have significant benefits as productivity booster<\/strong> for those designing\/studying heuristics; as a risk-analysis-engine<\/strong> for evaluating heuristics before we deploy them in production; and as a tool for explainable-AI and active learning.<\/strong><\/p>\n<\/blockquote>\n\n\n\n

How do people use MetaOpt?<\/h2>\n\n\n\n

To analyze their heuristics, users specify the heuristic and the optimal (or another heuristic) as inputs to MetaOpt, and MetaOpt automatically encodes these efficiently as inputs to a solver to find performance gaps and adversarial inputs that cause them. Not all users know optimization theory; we have designed a higher-level abstraction for MetaOpt that allows users to input their heuristics using a few simple building blocks. Using this abstraction, we are also able to search through decisions the heuristic makes that cause it to underperform or properties of the inputs that trigger the heuristic to make bad decisions<\/strong>.<\/p>\n\n\n\n

\"MetaOpt<\/figure>
\n

The MetaOpt workflow involves 4 steps: (1) users encode the heuristic; (2) MetaOpt automatically does re-writes to obtain a single-level optimization; (3) it partitions the problem into smaller sub-problems to achieve scale; (4) it uses existing solvers to find the highest performance gap.<\/p>\n<\/div><\/div>\n\n\n\n

How does MetaOpt work?<\/h2>\n\n\n\n

MetaOpt is based on the notion of Stackelberg games \u2013 a well known class of leader-follower games in game theory. In such games, a leader maximizes their payoff and controls the inputs to one or more followers. With this fixed <\/em>input from the leader, the followers have to respond and optimize for their own payoff and decide values for the variables in their control, which the leader does not control but which influences their payoff. In the context of MetaOpt, the leader maximizes the performance difference between two followers and decides the inputs to these followers to achieve this goal; the followers are algorithms that we want to compare and decide internal variables which in-turn influence the leader’s payoff.<\/p>\n\n\n\n

\"MetaOpt<\/figure>\n\n\n\n

So are we done?<\/h2>\n\n\n\n

Not even close. Stackelberg games apply to a broad class of leaders and followers. Our current realization of MetaOpt is based on a subset of these games that we can represent and solve through bi-level optimization (these cover heuristics we can represent as a feasibility or convex optimization problem which already covers a broad range of heuristics in packet scheduling, traffic engineering, bin-packing, and many more). We plan to extend MetaOpt and add support for other types of heuristics that do not fit into this paradigm.<\/p>\n\n\n\n

The MetaOpt Team is investigating these continuations to the work and is extending MetaOpt to enable new usecases. Who knows, we may even design a GPT-based plug-in for it \ud83d\ude09<\/p>\n\n\n\n

Why did we build MetaOpt?<\/h2>\n\n\n\n

We started the MetaOpt in the spring of 2022 in response to a production need: to analyze one of the heuristics deployed in our Wide Area Network\u2019s traffic engineering solution. The team then focused on enabling users who do not have a background in optimization theory to model their use cases in MetaOpt. We are currently working on improving MetaOpt\u2019s ability to scale to larger problem instances, improve its usability, extend the types of heuristics it can support, and open sourcing it to the community.<\/p>\n\n\n\n

Who are we?<\/h2>\n\n\n\n

The core team that works on MetaOpt in Microsoft are part of the Networking Research and Azure for Operators groups. We are: Behnaz Arzani, Ryan Beckett, Siva Kakarla, and Srikanth Kandula. MetaOpt was done in collaboration with Santiago Segarra from Rice University and our interns Pooria Namyar from USC, Solal Pirelli from EPFL, and Pantea Karimi from MIT. Roozbeh Boostandoost and Mohammad Hajiesmaili from UMASS have recently started collaborating with the MetaOpt team.<\/p>\n\n\n\n

We collaborate extensively with production teams to help analyze heuristics we have in production. Specifically, we work with Elnaz Jalilipour, Sina Taheri, Himanshu Raj, and Umesh Krishnaswamy.<\/p>\n\n\n\n

Rodrigo Fonseca, Daniel Berger, Ankur Mallick, and Kevin Hsieh collaborate with us on using MetaOpt for explainable AI.<\/p>\n\n\n\n

The work on MetaOpt is the latest in a series of tools our team has designed to help analyze deployed heuristics. Our prior work in this space includes: Aragog (a runtime verifier for distributed network functions); GRoot (the first verifier for DNS zone-files), FixRoute (a scalable, optimization-based, solution to ensuring loop-free routing), among others. <\/p>\n\n\n\n

Who should I contact if I have a use-case MetaOpt can help with? <\/h2>\n\n\n\n

Behnaz Arzani (bearzani@microsoft.com) and Pooria Namyar (Namyar@usc.edu) are the main contacts for the MetaOpt project.<\/p>\n\n\n\n\n\n

Do you have examples where you used MetaOpt?<\/h2>\n\n\n\n

Here are some example use-cases and results:<\/p>\n\n\n\n

Comparing Heuristics in Traffic Engineering. <\/strong>We used MetaOpt to find the performance gaps for demand pinning and POP where we compare them to the optimal multi-commodity flow algorithm. We compute the performance gap — the difference between the heuristic and the optimal which we normalize by the total network capacity. The performance gap is a lower bound on the optimality gap, the worst-case gap between the two. <\/p>\n\n\n\n

We find the demand pinning (DP) heuristic Microsoft uses for WAN traffic engineering  and POP incur 33.9% and 20% relative performance gaps on a large topology. This means there exists (and we can find) adversarial traffic demands that cause the demand pinning heuristic we use in Microsoft to use 33.9% more capacity compared to optimal. Network operators that use DP may need to over-provision the network by that much to satisfy this demand. MetaOpt by default, searches for adversarial demands among all possible demands.<\/p>\n\n\n\n

We can constrain MetaOpt to search over realistic demands. These exhibit temporal locality where few node pairs communicate. The gap for PoP and DP reduces by less than 1% when we run MetaOpt with this constraint.<\/p>\n\n\n\n

Comparing heuristics in Packet scheduling.<\/strong> We compared the packet scheduling algorithm SP-PIFO to PIFO. We compute and compare the priority-weighted average packet delay between the two algorithms which penalizes them if they increase the delay of high-priority packets.<\/p>\n\n\n\n

MetaOpt shows there exists an input packet sequence where SP-PIFO is 3X worse than PIFO. We used MetaOpt to compare SP-PIFO and AIFO (two heuristics). AIFO emulates PIFO through a single FIFO queue. MetaOpt finds inputs for which AIFO incurs 6X more priority inversions than SP-PIFO. Such analyses can help designers weigh performance trade-offs against switch resource usage.<\/p>\n\n\n\n

Proving properties of and improving heuristics.<\/strong> We used MetaOpt to find adversarial inputs for various heuristics and analyzed these inputs to prove performance bounds for these heuristics or to improve them.<\/p>\n\n\n\n

We proved a new bound for vector bin-packing.<\/em> Vector bin packing (VBP) heuristics try to minimize the number of bins they use. Theoreticians prove bounds on their approximation ratio: the worst-case ratio, over any input, of the number of bins the heuristic uses compared to the optimal. <\/p>\n\n\n\n

A heuristic’s approximation ratio is the worst-case ratio, over any input, of the number of bins needed for the heuristic to that needed for the optimal. Recent work showed 2-dimensional FFDSum (which is a greedy heuristic that scores and sorts balls based on the sum across all of their dimensions and adds them to the first bin they can fit in) asymptotically approaches an approximation ratio of 2 (where the optimal uses nearly infinite bins). We proved the approximation ratio is always at least 2—even when the optimal requires a finite number of bins!<\/p>\n\n\n\n

Proving a new bound for packet scheduling and improving the heuristic.<\/em> We analyzed adversarial inputs MetaOpt found for SP-PIFO and proved a lower bound on its priority-weighted average delay relative to PIFO. The bound is a function of the priority range and the number of packets.<\/p>\n\n\n\n

Adversarial inputs to SP-PIFO trigger priority inversions, which queue high priority packets behind low priority ones. We tested a Modified-SP-PIFO, which splits queues into groups; we assign each group a priority range, and run SP-PIFO on each group independently. Modified-SP-PIFO reduces the performance gap of SP-PIFO by 2.5X.<\/p>\n\n\n\n

We have also used MetaOpt in many other contexts such as WAN capacity planning, analyzing machine learning models, analyzing caching algorithms, and many other use cases. We will publish soon.<\/p>\n\n\n\n

Who do I contact with questions?<\/h2>\n\n\n\n

Behnaz Arzani (bearzani@microsoft.com) and Pooria Namyar (Namyar@usc.edu) are the main contacts for the MetaOpt project.<\/p>\n\n\n","protected":false},"excerpt":{"rendered":"

We use heuristics all the time across many systems including those that are critical to production services. Production systems use heuristics because they are faster or scale better than their optimal counterparts. But practitioners often don\u2019t know the performance gap between the heuristic and the optimal, or another heuristic in realistic scenarios. We present MetaOpt, […]<\/p>\n","protected":false},"featured_media":982215,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13547],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-982158","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-systems-and-networking","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2022-03-01","related-publications":[897075,986397,1061313,1096563,1096575],"related-downloads":[1045926],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Behnaz Arzani","user_id":37320,"people_section":"Related people","alias":"bearzani"},{"type":"user_nicename","display_name":"Ryan Beckett","user_id":37775,"people_section":"Related people","alias":"rybecket"},{"type":"user_nicename","display_name":"Siva Kesava Reddy Kakarla","user_id":42540,"people_section":"Related people","alias":"sivakakarla"},{"type":"guest","display_name":"Pooria Namyar","user_id":982227,"people_section":"Related people","alias":""},{"type":"guest","display_name":"Solal Pirelli","user_id":982230,"people_section":"Related people","alias":""},{"type":"guest","display_name":"Santiago Segarra","user_id":982206,"people_section":"Related people","alias":""}],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/982158"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":86,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/982158\/revisions"}],"predecessor-version":[{"id":1081668,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/982158\/revisions\/1081668"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/982215"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=982158"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=982158"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=982158"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=982158"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=982158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}