{"id":739090,"date":"2021-04-14T10:47:59","date_gmt":"2021-04-14T17:47:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=739090"},"modified":"2021-04-30T11:05:41","modified_gmt":"2021-04-30T18:05:41","slug":"reinforcing-program-correctness-with-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/reinforcing-program-correctness-with-reinforcement-learning\/","title":{"rendered":"Reinforcing program correctness with reinforcement learning"},"content":{"rendered":"\n<figure class=\"wp-block-image alignwide size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1024x576.jpg\" alt=\"\" class=\"wp-image-740335\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1536x864.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-16x9.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1920x1080.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Many of our online activities, from receiving and sending emails to searching for information to streaming movies, are driven behind the scenes by cloud-based distributed architectures. Writing concurrent software\u2014programs with multiple logical threads of execution\u2014is of paramount importance to scale to these growing computing needs. Unfortunately, writing <em>correct<\/em> concurrent software is challenging. Unit, integration, and even stress testing don\u2019t provide reasonable guarantees about the correctness of a concurrent program. Thus, insidious defects can remain latent in software until the late stages of development, potentially adding cost and stress to already tight timelines designed to help ensure new software is still relevant upon release.<\/p>\n\n\n\n<p><em>Controlled concurrency testing<\/em> (CCT), an emerging approach in this space, aims to take over the concurrency in a program so that thread\/process interleavings can be directly controlled. Using a variety of strategies, CCT attempts to identify buggy program executions, converting the testing problem into a search problem over the space of all possible program behaviors, which can typically be astronomical in number for concurrent programs. <\/p>\n\n\n\n<p>Under the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/coyote\/\">Coyote<\/a> project, which has been used to build several mission-critical Microsoft Azure services, we\u2019ve been working on providing effective CCT-based solutions to find complex defects arising from concurrency. Existing state-of-the-art CCT strategies are typically hand-tuned, making it nearly impossible to guarantee that a strategy that worked well in a previous application will work well with your application. This led us to ask an intriguing question: can we use machine learning, with no prior knowledge about an application, to<em> figure out <\/em>enough semantic details to expose bugs with high probability? We gave this a shot and designed <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-based-controlled-concurrency-testing\/\">QL<\/a>. To the best of our knowledge, it&#8217;s the first reinforcement learning\u2013based CCT search strategy. Our strategy, even without being taught anything about concurrent programs or interleavings, was able to beat the state-of-the-art human-designed CCT strategies.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<ul class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<li class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-based-controlled-concurrency-testing\/\" target=\"_self\" class=\"annotations__link font-weight-semibold text-decoration-none\" data-bi-type=\"annotated-link\" aria-label=\"Learning-based Controlled Concurrency Testing\" data-bi-aN=\"margin-callout\" data-bi-cN=\"Learning-based Controlled Concurrency Testing\">\n\t\t\t\tLearning-based Controlled Concurrency Testing&nbsp;<span class=\"glyph-append glyph-append-chevron-right glyph-append-xsmall\"><\/span>\n\t\t\t<\/a>\n\t\t\t\t\t<\/li>\n\t<\/ul>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<ul class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<li class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Toolkit<\/span>\n\t\t\t<a href=\"https:\/\/github.com\/microsoft\/coyote\" target=\"_self\" class=\"annotations__link font-weight-semibold text-decoration-none\" data-bi-type=\"annotated-link\" aria-label=\"Coyote\" data-bi-aN=\"margin-callout\" data-bi-cN=\"Coyote\">\n\t\t\t\tCoyote&nbsp;<span class=\"glyph-append glyph-append-share glyph-append-xsmall\"><\/span>\n\t\t\t<\/a>\n\t\t\t\t\t\t\t<p class=\"annotations__caption text-neutral-400 mt-2\">Coyote is a .NET library and tool designed to help ensure that your code is free of concurrency bugs.<\/p>\n\t\t\t\t\t<\/li>\n\t<\/ul>\n<\/div>\n\n\n\n<p>QL was presented at the<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/2020.splashcon.org\/track\/splash-2020-oopsla#About\"> 2020 Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, where it won the Distinguished Artifact Award, and is available open source for experimentation on GitHub as part of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/coyote\/\">Coyote<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"498\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-1_-QL-Blog_UpdatedNoGrey-1024x498.jpg\" alt=\"A circular flow chart illustrating a high-level overview of the architecture of the controlled concurrency testing (CCT) framework. The input to CCT is a concurrent program,  represented by a rectangle labeled \u201cConcurrent Program under Test.\" The program is assumed to have N threads, and   ithe high-level code for  three representative threads is shown in separate rectangles: each thread starts (indicated by a left brace), does some work, and terminates (indicated by a right brace). An arrow goes from the \u201cConcurrent Program under Test\u201d rectangle to a rectangle above it labeled \u201cControlled Concurrency Testing.\u201d The arrow is labeled \"Next Program State s\" and \"Set of enabled program actions A. \"  The main component of CCT , a scheduler, is shown in a rectangle within the CCT rectangle. The scheduler picks the next action to execute by calling a SearchStrategy subroutine and then informs the concurrent program to execute this action. This flow is denoted by an arrow going back around from the CCT rectangle to the \u201cConcurrent Program under Test\u201d rectangle.\" class=\"wp-image-740227\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-1_-QL-Blog_UpdatedNoGrey-1024x498.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-1_-QL-Blog_UpdatedNoGrey-300x146.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-1_-QL-Blog_UpdatedNoGrey-768x374.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-1_-QL-Blog_UpdatedNoGrey-16x8.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-1_-QL-Blog_UpdatedNoGrey.jpg 1122w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 1: Controlled concurrency testing (CCT) aims to ease the challenge of writing correct concurrent software by taking over a program\u2019s concurrency. Whenever a concurrent program executes an action, it transitions to a new state. The CCT framework comprises a scheduler, which observes the next state (s), and a set of enabled program actions (A). Using search strategies, the scheduler identifies the next action from a particular state that maximizes the likelihood of the program transitioning to a buggy state.<\/figcaption><\/figure>\n\n\n\n<h2 id=\"cct-and-the-trouble-with-finding-concurrency-bugs\">CCT and the trouble with finding concurrency bugs<\/h2>\n\n\n\n<p>The CCT framework (Figure 1) comprises a <em>scheduler<\/em>\u2014which observes the next state of the concurrent program, such as the value of global variables or the set of inflight messages\u2014and a set of enabled program <em>actions<\/em>, such as \u201cthread T<sub>1<\/sub> writes to a global variable\u201d or \u201cthread T<sub>2<\/sub> sends a message to T<sub>1<\/sub>.\u201d Whenever the program executes an action, it transitions to a new state. The job of the scheduler is to simply select an enabled action from a given state.<\/p>\n\n\n\n<p>Since the search space of program behaviors is often very large, we need effective ways of navigating this space. Schedulers employ <em>search strategies<\/em>, which identify the next action from a particular state that maximizes the likelihood of the program transitioning to a <em>buggy <\/em>state, one where some program correctness property is violated. Thus, CCT converts the testing problem into a search problem over the space of all interleavings, looking for buggy executions. As mentioned above, existing search strategies typically use human-tuned heuristics based on previously observed bug patterns, which often can\u2019t provide guarantees that a strategy successful in one application will be as effective in another.<\/p>\n\n\n\n<p>Let us consider an example to illustrate why concurrency bugs can be hard to find. <em>Consensus algorithms<\/em>, coordinating between multiple concurrently executing nodes to reach an agreement, lie at the heart of many modern distributed systems, and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/raft.github.io\/\">Raft<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is a popular consensus algorithm.<\/p>\n\n\n\n<p>A node in the Raft protocol can assume one of three roles: <em>leader<\/em>, <em>candidate,<\/em> or <em>follower<\/em>. The leader receives client requests and replicates them among the remaining nodes. At any point, one or more candidates can initiate a leader-election round, in which nodes exchange voting messages among themselves. The candidate obtaining the most votes is deemed to be the new leader. Raft maintains an important invariant: there can be <em>at most <\/em>one leader at a time. Failure of this invariant can result in the system transitioning to a corrupted state and is a serious bug.<\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/ktoso\/akka-raft\/issues\/45\">Akka Raft 2015<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, an implementation of this protocol, had a tricky bug. Candidate nodes lacked the necessary logic to identify and discard duplicate votes, resulting in a vote getting counted multiple times. Duplicate votes can occur because of delays in the network, with a node timing out for the acknowledgment and voting again. To expose this bug, specific events must occur in a definite order: there needs to be an election round with multiple candidates, a candidate&nbsp;\\(A\\) must receive the most votes, a follower who voted for candidate&nbsp;\\(B\\) has to time out and send duplicate votes, the vote counts for \\(A\\)&nbsp;and \\(B\\) need to match, and the system must end up with two leaders.<\/p>\n\n\n\n<p>Out of the astronomical number of possible behaviors of Raft, it\u2019s challenging for a testing framework such as CCT to drive the system through this specific one. We turned to RL to make searching that space more effective. Additionally, RL allows for a solution that is customized to the application under test, unlike strategies that are hand-tuned based on other scenarios.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"422\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/GLBlogFig2UpdatedNoGrey-1024x422.jpg\" alt=\"A circular flow chart of the reinforcement learning problem. At the top, from a rectangle labeled \u201cAgent,\u201d an arrow labeled \u201cAction Taken\u201d points to a rectangle below labeled \u201cEnvironment (unknown a priori).\u201d From the \u201cEnvironment\u201d rectangle, an arrow labeled \u201cNext State Reward\/Penalty\u201d points around to the \u201cAgent\u201d rectangle.   \" class=\"wp-image-740239\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/GLBlogFig2UpdatedNoGrey-1024x422.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/GLBlogFig2UpdatedNoGrey-300x124.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/GLBlogFig2UpdatedNoGrey-768x316.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/GLBlogFig2UpdatedNoGrey-16x7.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/GLBlogFig2UpdatedNoGrey.jpg 1394w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 2: In the reinforcement learning problem, an agent interacts with an unfamiliar environment. Each action an agent chooses to take causes the environment to undergo a state transition, allowing the agent to observe a new state, and results in a reward\/penalty for the choice. The agent\u2019s goal is to learn a sequence of actions that maximizes its expected reward.<\/figcaption><\/figure>\n\n\n\n<h2 id=\"ql-cct-meets-q-learning\">QL: CCT meets Q-learning<\/h2>\n\n\n\n<p>The reinforcement learning (RL) problem (Figure 2) consists of an <em>agent <\/em>interacting with an <em>environment <\/em>about which it has no prior knowledge. At each step, the agent chooses an action, which causes the environment to undergo a state transition. In response, the agent observes a new environment state and receives a <em>reward\/penalty<\/em> as feedback for its choice. The agent\u2019s goal is to learn a sequence of actions that maximizes its expected reward. This model has achieved spectacular success in domains such as robotics; game playing, including games like Go and backgammon; and autonomous driving.<\/p>\n\n\n\n<p>A careful look at the CCT and RL architectures reveals some striking similarities. Both have entities (an agent in RL and the search strategy in CCT) that are navigating an unknown search space (the environment in RL and the concurrent program under test in CCT) with a specific objective (maximizing expected reward in RL and maximizing likelihood of finding a concurrency bug in CCT). This was our starting point. We mapped the search strategy component in CCT to an RL agent and the concurrent program under test to the unknown environment.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-1024x576.jpg\" alt=\"A circular flow chart of the QL framework. At the top is a rectangle labeled \u201cQL Scheduling Strategy\u201d with the word \u201cAgent\u201d in parenthesis, representing that the strategy is mapped to an RL agent. From the rectangle, an arrow points around to a rectangle below it that contains a input state space. The rectangle is labeled \u201cProgram under test\u201d with the word \u201cEnvironment\u201d in parenthesis, representing that the program is mapped to the unknown environment. The arrow is labeled \u201cNext op,\u201d and alongside it are the Softmax selection function and the value update formula.\" class=\"wp-image-740191\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-1536x864.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-2048x1152.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-16x9.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still-1920x1080.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 3: In QL, a new controlled concurrency testing (CCT) search strategy, the search strategy component in CCT is mapped to a reinforcement learning agent and the concurrent program under test to the unknown environment. QL leverages the classic Q-learning algorithm from the RL domain, allowing it to maximize coverage of the program\u2019s state space in its search for concurrency bugs.<\/figcaption><\/figure>\n\n\n\n<p>Our resulting search strategy leverages the classic <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00992698.pdf\">Q-learning algorithm<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> from the RL domain. For each state <em>s<\/em> and for each action <em>a<\/em> enabled from that state, QL maintains a quality metric <em>Q(s,a)<\/em> called the Q-value. QL decides which action to execute next using a Softmax selection function. In response, the program transitions to a new state and presents a <em>penalty <\/em>signal back to QL, which uses it to adjust the Q-values with the objective of maximizing coverage of the program\u2019s state space.<\/p>\n\n\n\n<p>The key advantage of QL is that instead of focusing on specific bug patterns, you can <em>specify<\/em> the state space of the program that is <em>relevant<\/em> to the logic being tested, and QL does the best job of maximizing the coverage of this state space.<\/p>\n\n\n\n<h2 id=\"controlled-testing-of-the-raft-protocol\">Controlled testing of the Raft protocol<\/h2>\n\n\n\n<p>We compared QL to two existing state-of-the-art CCT search strategies, <em>Random<\/em> and <em>probabilistic concurrency testing<\/em> (PCT). With Raft, Random will, at each step, arbitrarily pick one of the nodes to execute. PCT will assign a set of priorities to all the participating nodes and select the node with the highest priority. It will then sporadically lower the priority of the currently executing node and pass the control to the node that had the second-highest priority. Note that <em>both <\/em>these strategies are geared toward empirically observed bug patterns.<\/p>\n\n\n\n<p>How can we tell if a strategy is making progress toward uncovering the bug in Raft? Two reasonable metrics include the total number of elected leaders and the total number of election rounds with multiple candidates. A strategy maximizing these metrics is likely to observe more program behaviors in Raft and have a higher likelihood of exposing the bug.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"456\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-4-_QL-Blog-_High-Res-1024x456.jpg\" alt=\"Two line graphs showing the performance of QL, PCT, and Random on the Raft protocol. The x-axis of both denotes individual runs, beginning with 320 runs and ending with 10,240. The y-axis of the graph on the left shows the total number of elected leaders explored, from 0 to 10,000; the y-axis of the graph on the right shows the total number of election rounds with multiple candidates explored, from 0 to 5,000. In the first line graph, as the number of runs increases, QL (represented by a solid blue line with circles for plot points) increasingly explores the most number of leaders elected, followed by Random (represented by a dotted and dashed red line with x\u2019s for plot points) and PCT (represented by a dashed green line with triangles for plot points). In the second line graph, QL increasingly explores the most number of election rounds with multiple candidates, followed by Random and then PCT.    \" class=\"wp-image-739762\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-4-_QL-Blog-_High-Res-1024x456.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-4-_QL-Blog-_High-Res-300x134.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-4-_QL-Blog-_High-Res-768x342.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-4-_QL-Blog-_High-Res-16x7.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-4-_QL-Blog-_High-Res-1066x476.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure-4-_QL-Blog-_High-Res.jpg 1068w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 4: QL outperforms two existing state-of-the-art CCT search strategies, Random and probabilistic concurrency testing (PCT), when run on Raft, a popular consensus algorithm used in modern distributed systems. QL maximizes the total number of elected leaders explored (left) and the total number of election rounds with multiple candidates explored (right), indicators of the amount of coverage. The more behaviors observed by a strategy, the more likely it is to expose a bug.<\/figcaption><\/figure><\/div>\n\n\n\n<p>In the above line graphs, the x-axis denotes individual runs of the Raft protocol. The y-axis of the first graph shows the total number of elected leaders explored by the different strategies, and the y-axis of the second graph shows the total number of election rounds in which there are multiple candidates. QL significantly outperforms Random and PCT on both metrics. Does this mean QL is better at finding the Raft bug? You bet! If you invoke a CCT framework 100 times with the different strategies, QL finds the bug 95 times, Random finds it four times, and PCT <em>never<\/em> finds the bug. We observed this repeatedly, in benchmarks ranging from complex protocols to production Azure services: QL can glean \u00ad<em>application-specific<\/em> semantic information during its exploration, allowing it to consistently outperform state-of-the-art CCT strategies.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"514\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure5_QLBlog_High-Res-1024x514.jpg\" alt=\"A line graph comparing the state coverage of QL, Random, and PCT in Raft. On the x-axis is the number of iterations, beginning with 320 and ending with 10,240; on the y-axis, is the number of unique abstract states, from 0 to 500,000. As the number of iterations increases, the number of unique abstract states observed by each strategy increases, with QL (represented by a solid blue line with circles for plot points) observing the most and experiencing the biggest increase, followed by Random (represented by a dotted green line with squares for plot points) and then PCT (represented by a dashed red line with triangles for plot points).  \" class=\"wp-image-739765\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure5_QLBlog_High-Res-1024x514.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure5_QLBlog_High-Res-300x151.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure5_QLBlog_High-Res-768x385.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure5_QLBlog_High-Res-16x8.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/Figure5_QLBlog_High-Res.jpg 1090w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 5: In more than 10,000 executions of Raft, QL covers nearly three times more states compared with Random and five times more states compared with PCT, a testament to QL\u2019s superior bug-finding ability. <\/figcaption><\/figure><\/div>\n\n\n\n<p>We can gain additional insights into the superior bug-finding ability of QL by comparing its state coverage with Random and PCT. The state of the protocol relevant for testing includes the set of in-flight messages (network state) and the status of each node (its role and whom its voting for currently). Using appropriate APIs, this is the state that we exposed to QL.<\/p>\n\n\n\n<p>As shown in the figure above, in around 10,000 executions of Raft, QL covers nearly three times more states compared with Random and five times more states compared with PCT. The QL strategy is geared toward performing a thorough coverage of the relevant state space, and the superior bug-finding ability is a byproduct.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<h2 id=\"find-out-more\">Find out more<\/h2>\n\n\n\n<p>QL is joint work with Microsoft Research Senior Research Software Engineer <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pdeligia\/\">Pantazis Deligiannis<\/a>, Harvard University Postdoctoral Fellow <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/sites.google.com\/view\/arpitabiswas\">Arpita Biswas<\/a>, and Microsoft Research Senior Principal Researcher <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/akashl\/\">Akash Lal<\/a>.<\/p>\n\n\n\n<p>For more information about QL, read our paper <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-based-controlled-concurrency-testing\/\">\u201cLearning-Based Controlled Concurrency Testing,\u201d<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and to learn more about Coyote, watch the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/innovation.microsoft.com\/en-us\/tech-minutes-project-coyote\">Tech Minutes: Project Coyote video<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. You can also try out QL using this <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/zenodo.org\/record\/4043041#.YG5pnxRKigQ\">virtual machine with everything installed<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Happy bug finding!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many of our online activities, from receiving and sending emails to searching for information to streaming movies, are driven behind the scenes by cloud-based distributed architectures. Writing concurrent software\u2014programs with multiple logical threads of execution\u2014is of paramount importance to scale to these growing computing needs. Unfortunately, writing correct concurrent software is challenging. Unit, integration, and [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":740335,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13560],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-739090","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-programming-languages-software-engineering","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199562],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[615984],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Suvam Mukherjee","user_id":40237,"display_name":"Suvam Mukherjee","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sumukherjee\/\" aria-label=\"Visit the profile page for Suvam Mukherjee\">Suvam Mukherjee<\/a>","is_active":false,"last_first":"Mukherjee, Suvam","people_section":0,"alias":"sumukherjee"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-960x540.jpg\" class=\"img-object-cover\" alt=\"A circular flow chart of the QL framework. At the top is a rectangle labeled \u201cQL Scheduling Strategy\u201d with the word \u201cAgent\u201d in parenthesis, representing that the strategy is mapped to an RL agent. From the rectangle, an arrow points around to a rectangle below it that contains a input state space. The rectangle is labeled \u201cProgram under test\u201d with the word \u201cEnvironment\u201d in parenthesis, representing that the program is mapped to the unknown environment. The arrow is labeled \u201cNext op,\u201d and alongside it are the Softmax selection function and the value update formula.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1536x864.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-16x9.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/04\/1400x788_QL_no_logo_still_v2-1-1920x1080.jpg 1920w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sumukherjee\/\" title=\"Go to researcher profile for Suvam Mukherjee\" aria-label=\"Go to researcher profile for Suvam Mukherjee\" data-bi-type=\"byline author\" data-bi-cN=\"Suvam Mukherjee\">Suvam Mukherjee<\/a>","formattedDate":"April 14, 2021","formattedExcerpt":"Many of our online activities, from receiving and sending emails to searching for information to streaming movies, are driven behind the scenes by cloud-based distributed architectures. Writing concurrent software\u2014programs with multiple logical threads of execution\u2014is of paramount importance to scale to these growing computing needs.&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/739090"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=739090"}],"version-history":[{"count":22,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/739090\/revisions"}],"predecessor-version":[{"id":743113,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/739090\/revisions\/743113"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/740335"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=739090"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=739090"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=739090"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=739090"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=739090"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=739090"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=739090"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=739090"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=739090"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=739090"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=739090"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}