{"id":765556,"date":"2021-08-11T09:14:49","date_gmt":"2021-08-11T16:14:49","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=765556"},"modified":"2021-08-11T09:14:51","modified_gmt":"2021-08-11T16:14:51","slug":"safe-program-merges-at-scale-a-grand-challenge-for-program-repair-research","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/safe-program-merges-at-scale-a-grand-challenge-for-program-repair-research\/","title":{"rendered":"Safe program merges at scale: A grand challenge for program repair research"},"content":{"rendered":"\n
\"\"\/<\/figure>\n\n\n\n

Since the computing world began embracing an open-source approach to programming, building software has become increasingly collaborative. Members of development teams with as few as two developers and as many as thousands are simultaneously editing different components in creating software systems and keeping them functioning optimally, and a three-way merge (opens in new tab)<\/span><\/a> is the<\/em> mechanism for integrating changes from these individual contributors. But with so many people independently altering code, it\u2019s unsurprising that updates don\u2019t always synchronize, resulting in bad merges<\/em>.<\/p>\n\n\n\n

Bad merges can take a range of forms. For example, textual merge conflicts occur when the changes from two branches can\u2019t be integrated by the default text-based merge algorithms used by version control systems such as Git, Concurrent Versions System (CVS), and Subversion. Most such conflicts are spurious from the perspective of their effect on program execution and often originate from the use of a 40-year-old diff3 algorithm (opens in new tab)<\/span><\/a> for merging text that is unaware of the syntax and semantics of programming languages. Such instances prevent developers from checking in their code, requiring them to manually fix the conflict or, if the solution is ambiguous, to consult with other developers. In other cases, bad program merges can be more subtle and costly, introducing semantic merge conflicts that may either fail the compiler, break a test, or\u2014worse\u2014introduce a regression. Bad merges constitute between 10 percent and 20 percent of all merges for large projects (opens in new tab)<\/span><\/a> and collectively result in stalled pull requests, failed continuous integration runs, or bugs in deployment, including exploitable security vulnerabilities (opens in new tab)<\/span><\/a>. Coping with bad merges, which could delay development anywhere from hours to days or impact customer trust in a product, is one of the well-known pain points (opens in new tab)<\/span><\/a> in collaborative software development and, additionally, often discourages less experienced developers from making meaningful contributions to large open-source projects.<\/p>\n\n\n\n

Over the past few years, we at Microsoft Research\u2014in collaboration with our academic colleagues and informed by recent large-scale studies of merge conflicts<\/a>\u2014have been revisiting the challenge, focusing on properties of safe program merges that allow harnessing the powers of program verification, program synthesis, and machine learning. First, the safety, or correctness, of a merge can be characterized by crisp formal specifications, making merges suited for verification with mathematical guarantees. Secondly, open-source software is a natural resource for merge conflict and resolution data, which can be leveraged by deep learning approaches. And third, there are project-specific patterns in how developers resolve bad merges that can be capitalized on by program synthesis. Our work in these spaces has produced several new techniques: an automatic, precise differential program verifier for ensuring a correct merge<\/a>; a deep learning\u2013based sequence-to-sequence model for synthesizing merge conflict resolutions<\/a>; and a domain-specific language that can learn repeated resolution patterns for textual merge conflicts<\/a>. Extending our work and finding ways to combine the strengths of these approaches with prior merge work holds the promise of real results\u2014improvements for developer productivity around collaboration at scale.<\/p>\n\n\n\n

\n
\n
\n\t