{"id":739090,"date":"2021-04-14T10:47:59","date_gmt":"2021-04-14T17:47:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=739090"},"modified":"2021-04-30T11:05:41","modified_gmt":"2021-04-30T18:05:41","slug":"reinforcing-program-correctness-with-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/reinforcing-program-correctness-with-reinforcement-learning\/","title":{"rendered":"Reinforcing program correctness with reinforcement learning"},"content":{"rendered":"\n
\"\"<\/figure>\n\n\n\n

Many of our online activities, from receiving and sending emails to searching for information to streaming movies, are driven behind the scenes by cloud-based distributed architectures. Writing concurrent software\u2014programs with multiple logical threads of execution\u2014is of paramount importance to scale to these growing computing needs. Unfortunately, writing correct<\/em> concurrent software is challenging. Unit, integration, and even stress testing don\u2019t provide reasonable guarantees about the correctness of a concurrent program. Thus, insidious defects can remain latent in software until the late stages of development, potentially adding cost and stress to already tight timelines designed to help ensure new software is still relevant upon release.<\/p>\n\n\n\n

Controlled concurrency testing<\/em> (CCT), an emerging approach in this space, aims to take over the concurrency in a program so that thread\/process interleavings can be directly controlled. Using a variety of strategies, CCT attempts to identify buggy program executions, converting the testing problem into a search problem over the space of all possible program behaviors, which can typically be astronomical in number for concurrent programs. <\/p>\n\n\n\n

Under the Coyote<\/a> project, which has been used to build several mission-critical Microsoft Azure services, we\u2019ve been working on providing effective CCT-based solutions to find complex defects arising from concurrency. Existing state-of-the-art CCT strategies are typically hand-tuned, making it nearly impossible to guarantee that a strategy that worked well in a previous application will work well with your application. This led us to ask an intriguing question: can we use machine learning, with no prior knowledge about an application, to figure out <\/em>enough semantic details to expose bugs with high probability? We gave this a shot and designed QL<\/a>. To the best of our knowledge, it’s the first reinforcement learning\u2013based CCT search strategy. Our strategy, even without being taught anything about concurrent programs or interleavings, was able to beat the state-of-the-art human-designed CCT strategies.<\/p>\n\n\n\n

\n\t