{"id":635229,"date":"2020-02-18T11:34:26","date_gmt":"2020-02-07T19:44:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=635229"},"modified":"2020-02-18T11:34:27","modified_gmt":"2020-02-18T19:34:27","slug":"ai-for-ai-metareasoning-for-modular-computing-systems","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/ai-for-ai-metareasoning-for-modular-computing-systems\/","title":{"rendered":"AI for AI: Metareasoning for modular computing systems"},"content":{"rendered":"
\nA new document in a word processor can be a magical thing, a blank page onto which thoughts and ideas are put forth as quickly as we can input text. We can select words and phrases to underline and highlight and add images, shapes, and bulleted lists, and when we need editorial help, we can run a grammar and spell checker. The experience can feel so seamless at times that perhaps we don\u2019t give much thought to how it all works.<\/p>\n
Behind the scenes are many modules, individual components tasked with each of those specific actions and many others, all working together to make the entire system function. A change in one can have significant impact on overall system performance, which is why a whole team of engineers can be responsible for working out the specific configurations of a single module. The spell-checker module, for example, can have multiple configurations, including a lightweight version and a heavyweight version. The lightweight spell checker, designed for small compute environments like our phones, gives up accuracy in return for speed; the heavyweight spell checker is very accurate, but slower, requiring more compute, like that offered by laptop and desktop environments.<\/p>\n
It would make sense to always use the lightweight spell checker when running the word processor in small-compute environments and the heavyweight spell checker when there\u2019s more available compute. For the most part, that\u2019s how software configuration is done today\u2014through trial and error, a configuration is determined based on a target environment and is then held constant during system execution.<\/p>\n
But let\u2019s say you\u2019re on your laptop, using your word processor while also running a compute-intensive operation in the background like video encoding. You select the spell-check tool and the word processor\u2014configured for the laptop environment\u2014deploys the heavyweight spell checker. Without as much compute available, the heavyweight spell checker is no longer an efficient option, delivering a sluggish response. In that moment, wouldn\u2019t it be great if the word processor could recognize the limited resources and switch to the lightweight spell checker for a better user experience?<\/p>\n
We propose a metareasoning approach that can assess a software system pipeline and adjust the parameters of individual modules for proper tradeoff among latency, accuracy, and other factors to ensure optimal operation of the entire system in real time. Looking at software pipeline optimization as a long-range sequential decision-making problem, we turn to reinforcement learning to accomplish this. Our paper on the work\u2014\u201cMetareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations,\u201d<\/a> by Aditya Modi<\/a>, Alekh Agarwal<\/a>, Adith Swaminathan<\/a>, Besmira Nushi<\/a>, Sean Andrist<\/a>, Eric Horvitz<\/a>, and myself<\/a>\u2014will be presented at the 34th AAAI Conference on Artificial Intelligence<\/a> by Modi, who worked on the research during his MSR AI PhD internship.<\/a><\/p>\n Complicated multimodal software systems are all around us. They\u2019re in our word processors, online banking, commercial search engines, and operating systems, and they\u2019re getting bigger by the day as more mission-critical systems, like self-driving cars, become a reality. And changes to the environments in which these systems are running are not limited to situations in which other systems are using available resources like in the word processor example; different kinds of input to the software pipeline might also require different amounts of resources to meet the overall application-level utility function, which measures how well the software pipeline as a whole is performing and often depends on both latency and accuracy.<\/p>\n Today, it takes teams of engineers writing thousands of lines of code per module to build and maintain these systems. The rise of deep learning, however, is challenging the notion of how we write them. Andrej Karpathy, senior director of AI at Tesla, envisions a move toward writing software in a data-driven manner, a shift he calls Software 2.0<\/a>. In a world of \u201cSoftware 2.0,\u201d instead of writing a module in code, one would gather input-output data and then train a module to learn to produce the correct output for a given input, much like a deep neural network. Such an approach would conveniently allow for an entire pipeline to be composed of such differentiable modules, and the standard pairing of gradient descent with backpropagation, a powerful credit assignment technique in supervised learning, could be used to optimize for the overall desired utility.<\/p>\n Unfortunately, such an approach may not be practical for even the simplest of pipelines. Pipelines would become uninterpretable, a limitation Karpathy acknowledges, and hard to debug, and more importantly, backpropagation plus gradient descent learns in expectation<\/em>. Indeed, most of supervised learning machinery is understood in both theory and practice in expectation over data. This makes mission-critical systems infeasible to construct following this prescription. Even if these challenges could be surmounted in the future, there would remain a vast majority of legacy software pipelines and components\u2014\u201cSoftware 1.0\u201d\u2014that would need to be optimized. We\u2019re interested in the promise of Software 2.0, but see benefits in advancing an approach to optimize Software 1.0 systems, since they have many intrinsic values.<\/p>\nMetareasoning vs. end-to-end differentiation<\/h3>\n