We use heuristics all the time across many systems, including those that are critical to production services. Production systems use heuristics because they are faster or scale better than their optimal counterparts. But practitioners often don’t know the performance gap between the heuristic and the optimal, or another heuristic in realistic scenarios.
In many cases, theoreticians also have been unable to prove tight bounds for such heuristics: the first fit decreasing heuristic for vector bin packing is one such example — theoreticians have studied it for over 30 years and have not yet found a tight bound that worked across all problem sizes (teaser: our tool, MetaOpt, enabled us to prove a tighter bound and design a constructive proof to do so).
A heuristic analyzer not only helps practitioners understand the performance gaps for their heuristics, but also helps them find practical situations where their heuristics underperform and deploy workarounds to mitigate their impact. They can analyze the decisions their heuristics make that cause these performance gaps and modify them to prevent problematic cases. They can also show what happens to the performance gap if we combine multiple heuristics to derive a better one. But the use cases for such a tool don’t stop here.
MetaOpt pioneers an area of research on scalable, general, and easy-to-use tools that enable users to analyze and explain the performance differences across competing algorithms and to improve these algorithms before they are deployed in high-stakes situations.
We present, MetaOpt, a system that helps us analyze heuristics we have in production. MetaOpt enables the use cases mentioned earlier and much more. We use it to analyze some of the popular machine learning models that we have in production at Microsoft; we have used it as a “a helper for theorem proving” to prove tighter performance bounds for heuristics in domains such as vector bin packing and packet scheduling; we have used it to analyze probabilistic heuristics as well; and we are now investigating how to use it for capacity planning in our WAN.
MetaOpt pioneers an area of research on scalable, general, and easy-to-use tools that enable users to analyze and explain the performance differences between competing algorithms and to help operators improve their heuristics before they deploy to production. While others have studied individual heuristics (e.g., congestion control and network traffic), none are general, provably optimal, or easy to use. MetaOpt is the first general-purpose and scalable tool that enables users to analyze a broad class of heuristics through easy-to-use abstractions that apply to a broad range of practical heuristics.
We believe MetaOpt can have significant benefits as productivity booster for those designing/studying heuristics; as a risk-analysis-engine for evaluating heuristics before we deploy them in production; and as a tool for explainable-AI and active learning.
How do people use MetaOpt?
To analyze their heuristics, users specify the heuristic and the optimal (or another heuristic) as inputs to MetaOpt, and MetaOpt automatically encodes these efficiently as inputs to a solver to find performance gaps and adversarial inputs that cause them. Not all users know optimization theory; we have designed a higher-level abstraction for MetaOpt that allows users to input their heuristics using a few simple building blocks. Using this abstraction, we are also able to search through decisions the heuristic makes that cause it to underperform or properties of the inputs that trigger the heuristic to make bad decisions.
The MetaOpt workflow involves 4 steps: (1) users encode the heuristic; (2) MetaOpt automatically does re-writes to obtain a single-level optimization; (3) it partitions the problem into smaller sub-problems to achieve scale; (4) it uses existing solvers to find the highest performance gap.
How does MetaOpt work?
MetaOpt is based on the notion of Stackelberg games – a well known class of leader-follower games in game theory. In such games, a leader maximizes their payoff and controls the inputs to one or more followers. With this fixed input from the leader, the followers have to respond and optimize for their own payoff and decide values for the variables in their control, which the leader does not control but which influences their payoff. In the context of MetaOpt, the leader maximizes the performance difference between two followers and decides the inputs to these followers to achieve this goal; the followers are algorithms that we want to compare and decide internal variables which in-turn influence the leader’s payoff.
So are we done?
Not even close. Stackelberg games apply to a broad class of leaders and followers. Our current realization of MetaOpt is based on a subset of these games that we can represent and solve through bi-level optimization (these cover heuristics we can represent as a feasibility or convex optimization problem which already covers a broad range of heuristics in packet scheduling, traffic engineering, bin-packing, and many more). We plan to extend MetaOpt and add support for other types of heuristics that do not fit into this paradigm.
The MetaOpt Team is investigating these continuations to the work and is extending MetaOpt to enable new usecases. Who knows, we may even design a GPT-based plug-in for it 😉
Why did we build MetaOpt?
We started the MetaOpt in the spring of 2022 in response to a production need: to analyze one of the heuristics deployed in our Wide Area Network’s traffic engineering solution. The team then focused on enabling users who do not have a background in optimization theory to model their use cases in MetaOpt. We are currently working on improving MetaOpt’s ability to scale to larger problem instances, improve its usability, extend the types of heuristics it can support, and open sourcing it to the community.
Who are we?
The core team that works on MetaOpt in Microsoft are part of the Networking Research and Azure for Operators groups. We are: Behnaz Arzani, Ryan Beckett, Siva Kakarla, and Srikanth Kandula. MetaOpt was done in collaboration with Santiago Segarra from Rice University and our interns Pooria Namyar from USC, Solal Pirelli from EPFL, and Pantea Karimi from MIT. Roozbeh Boostandoost and Mohammad Hajiesmaili from UMASS have recently started collaborating with the MetaOpt team.
We collaborate extensively with production teams to help analyze heuristics we have in production. Specifically, we work with Elnaz Jalilipour, Sina Taheri, Himanshu Raj, and Umesh Krishnaswamy.
Rodrigo Fonseca, Daniel Berger, Ankur Mallick, and Kevin Hsieh collaborate with us on using MetaOpt for explainable AI.
The work on MetaOpt is the latest in a series of tools our team has designed to help analyze deployed heuristics. Our prior work in this space includes: Aragog (a runtime verifier for distributed network functions); GRoot (the first verifier for DNS zone-files), FixRoute (a scalable, optimization-based, solution to ensuring loop-free routing), among others.
Who should I contact if I have a use-case MetaOpt can help with?
Behnaz Arzani (bearzani@microsoft.com) and Pooria Namyar (Namyar@usc.edu) are the main contacts for the MetaOpt project.