{"id":582940,"date":"2019-05-03T10:47:40","date_gmt":"2019-05-03T17:47:40","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=582940"},"modified":"2019-05-06T09:07:53","modified_gmt":"2019-05-06T16:07:53","slug":"beyond-spell-checkers-enhancing-the-editing-process-with-deep-learning","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/beyond-spell-checkers-enhancing-the-editing-process-with-deep-learning\/","title":{"rendered":"Beyond spell checkers: Enhancing the editing process with deep learning"},"content":{"rendered":"
(opens in new tab)<\/span><\/a><\/p>\n \u201cHere\u2019s my conference paper\u2014what do you think?\u201d After hours of agonizing over words and illustrations, sharing a draft document is a moment of great pride. All too often, this is shortly followed by embarrassment when your colleague gets back to you with reams of edits. Many of them may be simple grammar or style fixes or requests for citations\u2014minor suggestions that aren\u2019t as fulfilling or valuable for either author or editor as feedback regarding the substance of the content. The most basic form of feedback\u2014pointing out misspelled words\u2014is already ubiquitously automated, but … are there other, more complex classes of editorial tasks that can be learned and automated?<\/p>\n One domain particularly well-suited for exploring this question is source code editing. With advanced version control tools and refactoring utilities, source code repositories provide a wealth of data for training and testing deep learning models to investigate how to represent, discover, and apply edits. In our paper \u201cLearning to Represent Edits,\u201d (opens in new tab)<\/span><\/a> being presented at the 2019 International Conference on Learning Representations (ICLR) (opens in new tab)<\/span><\/a>, we\u2019ve used these resources to build unsupervised deep learning models that have shown promise in producing useful classification and application of edits in both code and natural language.<\/p>\n Deep learning has a strong track record in yielding impressive results for generating and understanding natural language and source code. The major challenge in this work is to devise a method that specifically encodes edits<\/em> so they can be processed and automated by deep learning techniques. To understand our approach, imagine the following analogy:<\/p>\n Consider a photocopier with a large number of settings. For example, there may be switches for black-and-white versus color, enlarging or shrinking a document, and selecting a paper size. We could represent the combination of these settings as a vector\u2014which, for brevity, we\u2019ll refer to as \u0394 here\u2014that describes the configuration of all the various knobs and switches on the photocopier. Photocopying an original document x–<\/sub> with settings \u0394 produces a new document x+<\/sub>, which has edits, such as the conversion from color to black-and-white, applied to it. The edits themselves are applied by the photocopier, and \u0394 is just a high-level representation of those edits. That is, \u0394 encodes the sentiment \u201cproduce a black-and-white copy,\u201d and it is up to the photocopier to interpret this encoding and produce low-level instructions to fine-tune the internal engine for a given input x–<\/sub>.<\/p>\n Translating this to our application of source code or natural language editing, the photocopier in the analogy is replaced with a neural network, and the edit representation \u0394 is a low-dimensional vector provided to that network. We restrict the capacity of this vector to encourage the system to learn to encode only high-level semantics of the edit\u2014for example, \u201cconvert output to black-and-white\u201d\u2014rather than low-level semantics such as \u201cmake pixel (0,0) white, make pixel (0,1) black, \u2026 .\u201d This means that given two different input documents x–<\/sub> and x–<\/sub>‘, \u0394 should perform the same edit, as shown below for the example source code.<\/p>\nLearning edit representations<\/h2>\n