{"id":171415,"date":"2014-10-03T10:32:59","date_gmt":"2014-10-03T10:32:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/nlpwin\/"},"modified":"2019-08-19T10:47:14","modified_gmt":"2019-08-19T17:47:14","slug":"nlpwin","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/nlpwin\/","title":{"rendered":"NLPwin"},"content":{"rendered":"<h2>An\u00a0introduction by Lucy Vanderwende*<\/h2>\n<p>* on behalf of everyone who contributed to the development of NLPwin<\/p>\n<p>NLPwin is a software project at Microsoft Research that aims to provide Natural Language Processing tools for Windows (hence, NLPwin). The project was started in 1991, just as Microsoft inaugurated the Microsoft Research group; while active development of NLPwin continued through 2002, it is still being updated regularly, primarily in service of Machine Translation.<\/p>\n<p>NLPwin was and is still being used in a number of Microsoft products, among which the Index Server (1992-3), Word Grammar Checker (parsing every sentence to logical form since 1996), the English Query feature for SQL Server (SQL Server 1998 &#8211; 2000), natural language query interface for Encarta (1999, 2000), Intellishrink (2000), and of course, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.bing.com\">Bing Translator<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p>Since we knew that we were developing NLPwin in part to support a grammar checker, the NLPwin grammar is designed to be broad-coverage (i.e., not domain-specific) and robust, in particular, robust to grammar errors. While most grammars are learned from data annotated on the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/repository.upenn.edu\/cgi\/viewcontent.cgi?article=1246&context=cis_reports&sei-redir=1&referer=http:\/\/www.bing.com\/search?q=Building+a+large+annotated+corpus+of+English%253A+the+Penn+Treebank#search=\">PennTreeBank<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, it is interesting to consider that such grammars may not be able to parse ungrammatical or fragmented grammar, since those grammars have no training data for such input. The NLPwin grammar produces a parse for any input and if no spanning parse can be assigned, it creates a \u201cfitted\u201d parse, combining the largest constituents that it was able to construct.<\/p>\n<p>The NLP rainbow: we envisioned that with ever more sophisticated analysis capabilities, it would be possible to create applications of a wide variety. As you can see below, the generation component was not well developed and we postulated NL applications for generation much as one hopes for a pot of gold at the end of the rainbow. Our first MT models transferred at the semantic level (papers through 2002), while today, our MT transfers primarily at the syntactic level, using a mixture of syntax-based and phrase-based models.<\/p>\n<div id=\"attachment_213592\" style=\"width: 490px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-213592\" class=\"wp-image-213592 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_rainbow.png\" alt=\"nlpwin_nlp_rainbow\" width=\"480\" height=\"360\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_rainbow.png 480w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_rainbow-300x225.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_rainbow-80x60.png 80w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><p id=\"caption-attachment-213592\" class=\"wp-caption-text\">Figure 1: The NLP rainbow (1991), our original vision for NLP components needed and applications possible.<\/p><\/div>\n<p>The architecture follows a pipeline approach, as shown in Figure 2,\u00a0where each component provides additional layers of analysis\/annotation of the input data. We designed the system to be relatively knowledge-poor in the beginning, while making use of richer and richer data sources as the need for more semantic information increased; one of our goals of this architecture is to preserve ambiguity until we either needed to resolve that ambiguity or the data resources existed to allow the resolution. Thus, the syntactic analysis proceeds in two steps: the syntactic sketch (which today might be described as a packed forest) and the syntactic portrait, where we \u201cunpack\u201d the forest and construct a constituent level of analysis which is syntactic, but also semantically valid. The constituency tree continues to be refined even during Logical Form processing as more global information can be brought to bear.<\/p>\n<div id=\"attachment_213595\" style=\"width: 490px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-213595\" class=\"wp-image-213595 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_components.png\" alt=\"nlpwin_nlp_components\" width=\"480\" height=\"360\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_components.png 480w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_components-300x225.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_components-80x60.png 80w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><p id=\"caption-attachment-213595\" class=\"wp-caption-text\">Figure 2: The NLPwin components and a schematic of their output representation.<\/p><\/div>\n<p>A few points are worth making about the parser (a term which loosely combines the morphology, sketch and portrait modules). First, the parser is comprised of human authored rules. This will cause incredulity among those who are only familiar with machine-learned parsers that have been trained on the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/repository.upenn.edu\/cgi\/viewcontent.cgi?article=1246&context=cis_reports&sei-redir=1&referer=http:\/\/www.bing.com\/search?q=Building+a+large+annotated+corpus+of+English%253A+the+Penn+Treebank#search=\" target=\"_new\" rel=\"noopener noreferrer\">PennTreeBank<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. It should be kept in mind that the NLPwin parser was constructed before the first parser was trained on the PennTreeBank, that the parser had to be fast (to support the grammar checker) and that grammar rule-writing was the norm pre-PennTreeBank grammars. Furthermore, the grammarian tasked with writing rules was supported by <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W\/W02\/W02-1510.pdf\" target=\"_new\" rel=\"noopener noreferrer\">a sophisticated array of NLP developer tools<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (created by George Heidorn), much as a programmer is now supported in Visual Studio, where grammar rules can be run to and from specific points in the code, variables can be changed interactively for exploration purposes, and most importantly, the developer environment supported running a suite of test files with interfaces for the grammarian to update the target files with improved parses. Secondly, the lead grammarian, Karen Jensen, broke with the implicit tradition where the constituent structure is implied by application of the parsing rules[1]. Jensen observed that binary rules are required to handle even common language phenomena such as free word order, and adverbial and prepositional phrase placement. Thus, in NLPwin, we use binary rules in an augmented phrase structure grammar formalism (APSG), computing the phrase structure as part of the actions of the rules, thereby creating nodes with unbounded modifiers, while maintaining binary rules, illustrated in Figure 3.<\/p>\n<div id=\"attachment_213598\" style=\"width: 650px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-213598\" class=\"wp-image-213598 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_derivation_tree.png\" alt=\"nlpwin_nlp_derivation_tree\" width=\"640\" height=\"360\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_derivation_tree.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_derivation_tree-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_derivation_tree-343x193.png 343w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><p id=\"caption-attachment-213598\" class=\"wp-caption-text\">Figure 3: The derivation tree displays the history of rule application, while the computed tree provides a useful visualization of phrase structure.<\/p><\/div>\n<p>Another important aspect of NLPwin is that it is the record structure, not the trees, that is the fundamental output of the analysis component (shown in Figure 4). Trees are merely a convenient form of display, using only 5 of the many attributes that make up the representation of the analysis (premodifiers (PRMODS), HEAD, postmodifiers (PSMODS), segment-type (SEGTYPE), and string value. Here is the record, a collection of attributes and values, for the node DECL1:<\/p>\n<div id=\"attachment_213601\" style=\"width: 650px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-213601\" class=\"wp-image-213601 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_records.png\" alt=\"nlpwin_nlp_records\" width=\"640\" height=\"360\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_records.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_records-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_records-343x193.png 343w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><p id=\"caption-attachment-213601\" class=\"wp-caption-text\">Figure 4: The record structure of any constituent is the heart of the NLPwin analysis.<\/p><\/div>\n<p>Once the basic shape of the constituency tree has been determined, it is possible to compute what the Logical Form is. The goal of Logical Form is twofold: to compute the predicate-argument structure for each clause (\u201cwho did what to whom when where and how?\u201d) and to normalize differing syntactic realizations of what can be considered the same \u201cmeaning\u201d. In so doing, concepts that are possibly distant in the sentence and in the constituent structure can be brought together, in large part because the Logical Form is represented as a graph, where linear order is no longer primary. The Logical Form is a directed, labeled graph, where arcs are labeled with those relations that are defined to be semantic and surface words that convey syntactic information only are represented not as nodes in the graph but rather as annotations on the nodes, preserving their syntactic information (not shown in the graph representation below). Consider the following Logical Form:<\/p>\n<div id=\"attachment_213604\" style=\"width: 650px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-213604\" class=\"wp-image-213604 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_logical_form.png\" alt=\"nlpwin_nlp_logical_form\" width=\"640\" height=\"360\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_logical_form.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_logical_form-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_logical_form-343x193.png 343w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><p id=\"caption-attachment-213604\" class=\"wp-caption-text\">Figure 5: A Logical Form example.<\/p><\/div>\n<p>The Logical Form graph in Figure\u00a05 represents the direct connection between \u201celephants\u201d and \u201chave\u201d, which is interrupted by a relative clause at the surface syntax. Moreover, in analyzing the relative clause, the Logical Form has performed two operations: Logical Form normalizes the passive construction as well as assigns the referent of the relative pronoun \u201cwhich\u201d. Other operations commonly performed by Logical Form include (but are not limited to): unbounded dependencies, functional control, indirect object paraphrase, assigning modifiers.<\/p>\n<p>Figure\u00a05\u00a0also demonstrates some of the shortcomings of Logical Form: 1) should \u201chave\u201d be a concept node in this graph or should it be interpreted as an arc labeled Part between \u201celephant\u201d and \u201ctusk\u201d? More generally: what should the inventory of relation labels be, and how should that inventory be determined? And 2) should we infer from this sentence only that \u201cAfrican elephants have been hunted\u201d and that \u201cAfrican elephants have large tusks\u201d, or can we infer that \u201celephants have been hunted\u201d and that they happen to be \u201cAfrican elephants\u201d. Deciding this question of scoping was postponed till discourse processing[2], when such questions may be addressed, and Logical Form does not represent the ambiguity in scoping.<\/p>\n<p>During development of the NLPwin pipeline (see Figure 2), we considered that there would be a separate component determining word senses following the syntactic analysis of the input. This component was meant to select and\/or collate lexical information from multiple dictionaries to represent and expand the lexical meaning of each content word.\u00a0 This view on Word Sense Disambiguation (WSD) was in contrast to the then-nascent interest in WSD in the academic community, which formulated the WSD task as selecting one sense of a fixed inventory of word senses\u00a0as being correct. Our primary objection to this formulation\u00a0is that any fixed inventory will necessarily not be sufficient as the foundation for a broad-coverage grammar (see <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/polysemy_chapter.doc\" target=\"_new\" rel=\"noopener noreferrer\">Dolan, Vanderwende and Richardson, 2000<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>). For similar reasons, we elected to abandon the pursuit of assigning Word Senses in NLPwin as well. Today, the field has made great strides in exploring a more flexible notion of lexical\u00a0meaning with the advent of vector space, which would be promising to combine with the output of this parser.<\/p>\n<p>While we did not view Word Sense Disambiguation as a separate task, we did design our parser and subsequent components to make use of ever richer lexical information. The sketch grammar relies on the subcategorization frames and other syntactic-semantic codes available from two dictionaries: Longman Dictionary of Contemporary English (LDOCE) and American Heritage Dictionary, 3<sup>rd<\/sup> edition, for which Microsoft acquired the digital rights. LDOCE in particular provides rich lexical information that facilitates the construction of Logical Form[3]. Such codes, rich as they are, do not support full semantic processing as is necessary when, for example, determining the correct attachment of prepositional phrases or nominal co-reference. The question was: is it possible to acquire such semantic knowledge automatically, in order to support a broad-coverage parser?<\/p>\n<p>In the early to mid-90s, there was considerable interest in mining dictionaries and other reference works for semantic information broadly-speaking. For this reason, we envisioned that where lexical information was not sufficient to support the decisions that needed to be made in the Portrait component, we would acquire such information in machine readable reference works.<\/p>\n<p>At the time, few broad-coverage parsers were available so the main thrust was to develop string patterns (regexes) that could be used to identify specific types of semantic information; Hearst (1992) describes the use of such patterns for the acquisition of Hypernymy (is-a terms). Alshawi (1989) parses dictionary definitions using a grammar especially designed for that dictionary (\u201cLongmanese\u201d).\u00a0 We encountered two concerns\u00a0about using\u00a0this approach: first, as the need for greater recall increases, writing and refining string patterns becomes more and more complex, in the limit, approaching the complexity of full-grammar writing and so\u00a0straying far from the straightforward string patterns you started with, and second, when extracting semantic relations beyond Hypernymy, we found string patterns to be insufficient (see <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C\/C92\/C92-2083.pdf\" target=\"_new\" rel=\"noopener noreferrer\">Montemagni and Vanderwende 1992<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>).<\/p>\n<p>Instead, we proposed to parse the dictionary text using the linguistic components already developed, Sketch, Portrait and Logical Form, ensuring access to robust parsing, in order to bootstrap the knowledge acquisition of the semantic information needed to improve the Portrait. This bootstrapping is possible because some linguistic expressions are unambiguous, and so, at each iteration, we can extract from unambiguous text to improve the parsing of ambiguous text (see <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aaai.org\/Papers\/Symposia\/Spring\/1995\/SS-95-01\/SS95-01-029.pdf\" target=\"_new\" rel=\"noopener noreferrer\">Vanderwende 1995<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>).<\/p>\n<p>As each definition in the dictionary and on-line encyclopedia was being processed and the semantic information was being stored for access by Portrait, a picture emerged from connecting all of the graph fragments. When viewed as a database rather than a look-up table (which is how people use dictionaries), the graph fragments are connected and interesting paths\/inferences emerge. To enrich the data further, we then took the step of viewing each graph fragment from the perspective of each content node. Imagine looking at the graph as a mobile and picking it up at each of the objects in turn &#8211; the nodes under the object remain the same, but the nodes above that object become inverted (illustrated in Figure 6). For example, for the definition of <b>elephant<\/b>: :an animal with ivory tusks\u201d, MindNet stores not only the graph fragment \u201celephant PART (tusk MATR ivory)\u201d but also \u201ctusk PART-OF elephant\u201d and \u201civory MATR-OF tusk\u201d[4].<\/p>\n<div id=\"attachment_213607\" style=\"width: 650px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-213607\" class=\"wp-image-213607 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_inverted_lf.png\" alt=\"nlpwin_nlp_inverted_lf\" width=\"640\" height=\"360\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_inverted_lf.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_inverted_lf-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_inverted_lf-343x193.png 343w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><p id=\"caption-attachment-213607\" class=\"wp-caption-text\">Figure 6: Logical Form and its inversions.<\/p><\/div>\n<p>We called this collection of intersecting graphs MindNet. Figure 7\u00a0reflects the picture we saw for the word \u201cbird\u201d when looking at all of the pieces of information that were automatically gleaned from dictionary text:<\/p>\n<div id=\"attachment_213610\" style=\"width: 490px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-213610\" class=\"wp-image-213610 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_mindnet.png\" alt=\"nlpwin_nlp_mindnet\" width=\"480\" height=\"360\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_mindnet.png 480w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_mindnet-300x225.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/10\/nlpwin_nlp_mindnet-80x60.png 80w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><p id=\"caption-attachment-213610\" class=\"wp-caption-text\">Figure 7: A fragment NLPwin MindNet, centered on the word &#8220;bird&#8221;<\/p><\/div>\n<p>As a person using only the dictionary, it would be very difficult to construct a list of all the different types of birds, all of the parts of a bird, all of the places that a bird may be found, or the types of actions that a bird may do. But by converting the dictionary to a\u00a0database, and inverting all the semantic relations as shown in Figure 6, MindNet contains rich semantic information for any concept that occurs in text, esp. because it is produced by automated methods using a broad-coverage grammar, a grammar that parses fragments as well as it parses complete grammatical input.<\/p>\n<p>We computed <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/P\/P98\/P98-2180.pdf\" target=\"_new\" rel=\"noopener noreferrer\">a similarity metric for MindNet<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> by using Roget\u2019s Thesaurus as annotated training data. Given a pair of words from Roget, we computed all paths in MindNet between those synonyms and then observed how often path patterns occur (patterns of relation types with and without the specific concepts linking those relations types). Thus, we learn that if X and Y are connected using the path pattern: (<b>X<\/b> &#8211; Hypernym &#8211; <i>z<\/i> &#8211; HypernymOf <b>Y<\/b>) or (<b>X<\/b> &#8211; HasObject &#8211; <i>z<\/i> &#8211; ObjectOf &#8211; <strong>Y<\/strong>), that X and Y are deemed to be similar with high weight. \u00a0We can then query arbitrary word pairs for their similarity, finding that &#8220;gold&#8221; and &#8220;zinc&#8221; are similar, while &#8220;gold&#8221; and &#8220;bicycle&#8221; are not.<\/p>\n<p>A priori, there is no reason that MindNet cannot be produced from text other than dictionary or encyclopedia text. Indeed, if MindNet was being developed today, we would aim to automatically acquire semantic information from the web. The notable engineering concern is processing time, though the availability of massively parallel web services mitigates that concern in large part. The other notable concern is to establish the veracity of the source material (part of IBM Watson\u2019s success in the Jeopardy game can be attributed to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&arnumber=6177719\" target=\"_new\" rel=\"noopener noreferrer\">careful selection of its information sources<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>). Even in the case where sources are equally trustworthy, what should happen to (apparent) contradictions? Weights computed for specific pieces of the knowledge graph can be used to balance how frequently that information is encountered, but the source itself should also be considered in the weight scheme. Moreover, MindNet is not simply a database of triples; we preserve the context from which the semantic relations were extracted, and so in theory, we could resolve apparent contradictions by taking context into account. We did not encounter these concerns as MindNet has only been computed from sources that are categorically true (dictionaries and encyclopedias), but these concerns should be addressed going forward with knowledge acquisition from the web.<\/p>\n<p>The original intent, as shown in Figure 2, was to reduce paraphrases to a canonical representation in a module that we tentatively named \u201cConcepts\u201d, though &#8220;Concept Detection&#8221; would have been more descriptive. As with Word Sense Disambiguation, we abandoned this module as we were dissatisfied with the underlying assumption that one representation of a concept or complex event would be primary over others, while in reality, both expressions are equivalent; equivalence should be fluid and allow to vary depending on the need of the application. Here again, we believe that the current research which aims to represent parse fragments in vector space is a promising approach, while emphasizing that it is essential to take the parse and logical form structure into account.<\/p>\n<p>Finally, a few words about the generation grammar (shown on the right hand side of the rainbow in Figure 1). In NLPwin, we developed two types of generation grammars: rule-based generation components (including those that shipped with Microsoft Word to enable the re-write of passive to active, e.g.) and Amalgam, a set of machine-learned generation modules. Both types of generation grammars were used in production for Machine Translation.<\/p>\n<h2>In Summary &#8230;<\/h2>\n<p>We\u2019ve described some of the aspects of the NLPwin project at Microsoft Research[5]. The lexical and syntactic processing components are designed to be broad-coverage and robust to grammatical errors, allowing for parses to be constructed for fragmented, ungrammatical as well as grammatical inputs. These components are largely rule-based grammars, making use of rich lexical and semantic resources derived from online dictionaries. The output of the parsing component, a tree analysis, is converted to a graph-based representation called Logical Form. The goal of Logical Form is to compute the predicate-argument structure for each clause and to normalize differing syntactic realizations of what can be considered the same \u201cmeaning\u201d. In so doing, the distance between concepts reflects the semantic distance and no longer the linear distance in the surface realization, bringing related concepts closer together than they might appear at the surface. MindNet is the automatic construction of the database of connected Logical Forms. When reference resources are the source text for MindNet, MindNet can be viewed as a traditional Knowledge Acquisition method and object, but when MindNet is constructed by processing arbitrary text input, MindNet represents a global representation of all the Logical Forms of that text which allows the browsing of the concepts and their semantic connections in that text. In fact, MindNet was considered most compelling as a means for browsing and exploring specific relations mined from a text collection.<\/p>\n<p>[1] see Jensen, Karen. 1987. Binary rules and non-binary trees: Breaking down the concept of phrase structure. In <i>Mathematics of language<\/i>, ed. A. Manaster-Ramer, 65-86. Amsterdam: John Benjamins Pub.Co.<\/p>\n<p>[2] In fact, the NLPwin system has not (yet) addressed this issue till today.<\/p>\n<p>[3] The LDOCE box codes, for instance, provide information on type restrictions and the arguments for verbs. In LDOCE, \u201cpersuade\u201d is marked ObjC, indicating, that \u201cpersuade\u201d has Object Control (i.e. that the object of \u201cpersuade\u201d is understood to be the subject of the verb complement). Thus, it is possible to construct a Logical Form with \u201cJohn\u201d as the subject of \u201cgo to the library\u201d from the input sentence: \u201cI persuaded John to go to the library\u201d, while for the input sentence \u201cI promised John to go to the library\u201d, the Logical Form is constructed with \u201cI\u201d as the subject of \u201cgo to the library\u201d.<\/p>\n<p>[4] The algorithm of course also identifies the relation \u201celephant HYPERNYM animal\u201d, but, in dictionary processing, the information extracted from the differentiae of the definition (the specifications on the hypernym), are true of the word being defined rather than true of the hypernym, and so we do not extract that \u201canimals have tusks\u201d but rather that \u201celephants have tusks\u201d.<\/p>\n<p>[5] At the time of this writing (2014) NLPwin is considered a mature system, with only limited development of the generation and logical form components.<\/p>\n\t<div class=\"moray-accordion\" data-show=\"false\">\n\t\t\t\t\t<div class=\"clearfix\">\n\t\t\t\t<div\n\t\t\t\t\tclass=\"btn-group align-items-center mb-g float-sm-right\"\n\t\t\t\t\trole=\"group\"\n\t\t\t\t\tdata-mount=\"collapse-controls\"\n\t\t\t\t\tdata-bi-aN=\"accordion-collapse-controls\"\n\t\t\t\t\tdata-target=\"#accordion-1\">\n\t\t\t\t\t<button class=\"btn btn-link m-0\" aria-disabled=\"false\" aria-pressed=\"false\" data-action=\"expand\" type=\"button\" data-bi-cN=\"Expand all\">Expand all<\/button>\n\t\t\t\t\t<span aria-hidden=\"true\"> | <\/span>\n\t\t\t\t\t<button class=\"btn btn-link m-0\" aria-disabled=\"false\" aria-pressed=\"false\" data-action=\"collapse\" type=\"button\" data-bi-cN=\"Collapse all\">Collapse all<\/button>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t\t<ul class=\"accordion\" id=\"accordion-1\">\n\t\t\t\t\t\t\t\t<li class=\"m-0\">\n\t\t<h4 class=\"accordion-header\">\n\t\t\t<button\n\t\t\t\tclass=\"btn btn-collapse m-0\"\n\t\t\t\ttype=\"button\"\n\t\t\t\tdata-mount=\"collapse\"\n\t\t\t\tdata-target=\"#collapse-2\"\n\t\t\t\taria-expanded=\"false\"\n\t\t\t\taria-controls=\"collapse-2\"\n\t\t\t>\n\t\t\t\tNLP Win Publications\t\t\t<\/button>\n\t\t<\/h4>\n\t\t<div class=\"collapse\" id=\"collapse-2\">\n\t\t\t<div class=\"accordion-body\">\n\t\t\t\t\t\t\t\t<h2>Development Environment<\/h2>\n<ul>\n<li>Hisami Suzuki. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W\/W02\/W02-1510.pdf\">A development environment for large-scale multi-lingual parsing systems<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 2002 workshop on Grammar engineering and evaluation &#8211; Volume 15, Pages 1-7.<\/li>\n<\/ul>\n<h2>Morphology<\/h2>\n<ul>\n<li>Joseph Pentheroudakis and Lucy Vanderwende, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/automatically-identifying-morphological-relations-in-machine-readable-dictionaries-2\/\">Automatically Identifying Morphological Relations in Machine-Readable Dictionaries<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. 1993. In Proceedings of the 9th Annual Conference of the UW Centre for the New OED and Text Research<em>.<\/em><\/li>\n<\/ul>\n<h2>Syntax<\/h2>\n<ul>\n<li>Jensen, Karen, George E. Heidorn and Stephen D. Richardson (eds.). 1993.<strong> Natural Language Processing: The PLNLP approach. Kluwer<\/strong>: Boston.<\/li>\n<li><em>NOTE: While the above is not a reference for the work done at Microsoft, the PLNLP approach provides a good overview of the motivation and design of the syntax system, as well as a number of other key components of the complete NLP system. <\/em><\/li>\n<li>Stephen D. Richardson. 1994. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W\/W94\/W94-0112.pdf\">Bootstrapping Statistical Processing into a Rule-based Natural Language Parser<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the The Balancing Act Workshop: Combining Symbolic and Statistical Approaches to Language, sponsored by ACL.<\/li>\n<li>Michael Gamon and Tom Reutter. 1997. The Analysis of German Separable Prefix Verbs in the Microsoft Natural Language Processing System. Microsoft Research Technical Report, MSR-TR-97-15, September 1997<\/li>\n<li>Michael Gamon, Carmen Lozano, Jessie Pinkham, and Tom Reutter. 1997. Practical Experience with Grammar Sharing in Multilingual NLP, Microsoft Research Technical Report no. MSR-TR-97-16<\/li>\n<li>Michael Gamon, Carmen Lozano, Jessie Pinkham, and Tom Reutter. 1997. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W\/W97\/W97-0908.pdf\">From Research to Commercial Applications: Making NLP Work in Practice<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the ACL Workshop &#8220;From Research to Commercial Applications: Making NLP Work in Practice&#8221;<\/li>\n<li>Michael Gamon, Carmen Lozano, Jessie Pinkham, and Tom Reutter. 1997. Practical Experience with Grammar Sharing in Multilingual NLP, no. MSR-TR-97-16<\/li>\n<li>Michael Gamon, Carmen Lozano, Jessie Pinkham, and Tom Reutter. 1997. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W\/W97\/W97-0908.pdf\">From Research to Commercial Applications: Making NLP Work in Practice<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the ACL Workshop &#8220;From Research to Commercial Applications: Making NLP Work in Practice&#8221;<\/li>\n<li>Takako Aikawa, Chris Quirk, and Lee Schwartz. 2003. Learning prepositional attachment from sentence aligned bilingual corpora, Association for Machine Translation in the Americas.<\/li>\n<li>Lee Schwartz; Takako Aikawa. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2004\/pdf\/8.pdf\">Multilingual Corpus-based Approach to the Resolution of English<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of LREC.<\/li>\n<\/ul>\n<h2>Logical Form<\/h2>\n<ul>\n<li>Lucy Vanderwende. 1994. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C\/C94\/C94-2125.pdf\">Algorithm for automatic interpretation of noun sequences<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 15th International Conference on Computational Linguistics, Volume 2.<\/li>\n<li>Lucy Vanderwende. 1996. The Analysis of Noun Sequences using Semantic Information Extracted from On-Line Dictionaries, PhD thesis, Georgetown University, Microsoft Research Technical Report, no. MSR-TR-95-57, October 1996<\/li>\n<li>Richard Campbell and Hisami Suzuki. 2002. Language-Neutral Syntax: An Overview<strong>. <\/strong>Microsoft Research Technical Report, MSR-TR-2002-76<\/li>\n<li>Richard Campbell. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C02-1043.pdf\">Computation of modifier scope in NP by a language-neutral method<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 19<sup>th<\/sup> International Conference on Computational Linguistics, COLING-2002.<\/li>\n<li>Richard Campbell and Hisami Suzuki. 2002. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/language-neutral-representation-of-syntactic-structure\/\">Language-Neutral Representation of Syntactic Structure<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the First International Workshop on Scalable Natural Language Understanding (SCANALU 2002), Heidelberg, Germany<\/li>\n<li>Richard Campbell, Takako Aikawa, Zixin Jiang, Carmen Lozano, Maite Melero and Andi Wu. 2002. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/a-language-neutral-representation-of-temporal-information\/\">A language neutral representation of temporal information<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In LREC 2002 Workshop Proceedings: Annotation Standards for Temporal Information in Natural Language. 13-21.<\/li>\n<li>Richard Campbell and Eric Ringger. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2004\/pdf\/380.pdf\">Converting Treebank Annotations to Language Neutral Syntax<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, In Proceedings of LREC.<\/li>\n<li>Richard Campbell. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.aclweb.org\/anthology\/P\/P04\/P04-1082.pdf\">Using linguistic principles to recover empty categories<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of ACL.<\/li>\n<\/ul>\n<h2>Word Sense Disambiguation<\/h2>\n<ul>\n<li>William B. Dolan. 1994. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C94-2113\">Word sense ambiguation: clustering related senses<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Proceedings of the 15th International Conference on Computational Linguistics, COLING&#8217;94, 5-9 August 1994, Kyoto, Japan, 712-716.<\/li>\n<li>William Dolan, Lucy Vanderwende, and Stephen D. Richardson. 2000. Polysemy in a Broad-Coverage Natural Language Processing System. In <strong>Polysemy: Theoretical and Computational Approaches<\/strong>. Eds. Yael Ravin and Claudia Leacock. Oxford University Press, July 2000.<\/li>\n<\/ul>\n<h2>Discourse<\/h2>\n<ul>\n<li>Simon Corston-Oliver. 1998. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.aaai.org\/Papers\/Symposia\/Spring\/1998\/SS-98-06\/SS98-06-002.pdf\">Beyond string matching and cue phrases<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of AAAI 98 Spring Symposium on Intelligent Text Summarization.<\/li>\n<li>Simon H. Corston-Oliver. 1998. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W98-0302\">Identifying the Linguistic Correlates of Rhetorical Relations<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Discourse Relations and Discourse Markers workshop at COLING-ACL98.<\/li>\n<li>Simon H. Corston-Oliver. 2000. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W\/W00\/W00-1008.pdf\">Using decision trees to select the gran natical relation of a noun phrase<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 1<sup>st<\/sup> SIGdial Workshop on Discourse and Dialogue, ACL.<\/li>\n<\/ul>\n<h2>MindNet \u2013 Automatic construction of a Knowledge Base<\/h2>\n<ul>\n<li>Simonetta Montemagni and Lucy Vanderwende. 1992. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C\/C92\/C92-2083.pdf\">Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, in Proceedings of the Fourteenth International Conference on Computational Linguistics, COLING-1992<\/li>\n<li>William Dolan, Stephen D. Richardson, and Lucy Vanderwende. 1993.<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/automatically-deriving-structured-knowledge-bases-from-on-line-dictionaries\/\"> Automatically Deriving Structured Knowledge Bases From On-Line Directories\u00a0<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, no. MSR-TR-93-07, May 1993<\/li>\n<li>William Dolan, Stephen D. Richardson, and Lucy Vanderwende. 1993. Combining Dictionary-Based and Example-Based Methods for Natural Language Analysis , no. MSR-TR-93-08, June 1993<\/li>\n<li>Lucy Vanderwende. 1995. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aaai.org\/Papers\/Symposia\/Spring\/1995\/SS-95-01\/SS95-01-029.pdf\">Ambiguity in the Acquisition of Lexical Information<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In AAAI Symposium on Representation and Acquisition of Lexical Knowledge<em>: <\/em>TR SS-95-01, AAAI, 1995<\/li>\n<li>Stephen D. Richardson, William B. Dolan, and Lucy Vanderwende. 1998. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C\/C98\/C98-2175.pdf\">MindNet: acquiring and structuring semantic information from text<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of COLING-ACL 1998<\/li>\n<li>Lucy Vanderwende, Gary Kacmarcik, Hisami Suzuki, and Arul Menezes. 2005. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/H\/H05\/H05-2005.pdf\">MindNet: an automatically-created lexical resource<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the HLT\/EMNLP Interactive Demonstrations, October 2005<\/li>\n<\/ul>\n<h2>Generation<\/h2>\n<ul>\n<li>Simon Corston-Oliver, Michael Gamon, Eric Ringger, and Robert Moore. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.aclweb.org\/anthology\/W\/W02\/W02-2105.pdf\">An overview of Amalgam: A machine-learned generation module<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of ACL<\/li>\n<li>Michael Gamon, Eric Ringger, and Simon Corston-Oliver. 2002. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/an-overview-of-amalgam-a-machine-learned-generation-module\/\">Amalgam: A machine-learned generation module<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Microsoft Research Technical Report no. MSR-TR-2002-57, June 2002<\/li>\n<li>Zhu Zhang, Michael Gamon, Simon Corston-Oliver, Eric Ringger. 2002. Intra-sentence Punctuation Insertion in Natural Language Generation. Microsoft Research Technical Report no. MSR-TR-2002-58<\/li>\n<\/ul>\n<h2>German<\/h2>\n<ul>\n<li>Michael Gamon, Eric Ringger, Simon Corston-Oliver, and Robert C. Moore. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/P\/P02\/P02-1004.pdf\">Machine-learned contexts for linguistic operations in German sentence realization<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, In Proceedings of ACL.<\/li>\n<li>Michael Gamon, Eric Ringger, Zhu Zhang, Robert Moore, and Simon Corston-Oliver. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C02-1036.pdf\">Extraposition: A case study in German sentence realization<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,. In Proceedings of ACL<\/li>\n<\/ul>\n<h2>French<\/h2>\n<ul>\n<li>Martine Smets, Michael Gamon, Simon Corston-Oliver, and Eric Ringger. 2003. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/E03-1006\">The adaptation of a machine-learned sentence realization system to French<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, In Proceedings of the European chapter of ACL<\/li>\n<li>Martine Smets, Michael Gamon, Simon Corston-Oliver, and Eric Ringger. 2003. French Amalgam: A machine-learned sentence realization system, Association pour le Traitement Automatique des Langues, TALN 2003<\/li>\n<\/ul>\n<h2>Spanish<\/h2>\n<ul>\n<li>Melero, M., T. Aikawa and L. Schwartz. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/aclweb.org\/anthology\/W02-2104\">Combining machine learning and rule-based<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/aclweb.org\/anthology\/W02-2104\">approaches in Spanish and Japanese sentence realization<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Second International Natural Language Generation Conference<\/li>\n<\/ul>\n<h2>Chinese<\/h2>\n<ul>\n<li>Wu, Andi and Zixin Jiang. 1998. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/chinese-word-segmentation-in-msr-nlp\/\">Word Segmentation in Sentence Analysis<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Microsoft Technical Report MSR-TR-99-10.<\/li>\n<li>Andi Wu, Joseph Pentheroudakis, and Zixin Jiang. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.aclweb.org\/anthology\/C\/C02\/C02-2002.pdf\">Dynamic Lexical Acquisition in Chinese Sentence Analysis<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> In Project Notes at the International Conference on Computational Linguistics, COLING-2002.<\/li>\n<\/ul>\n<h2>Japanese<\/h2>\n<ul>\n<li>Gary Kacmarcik, Chris Brockett and Hisami Suzuki. 2000. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C\/C00\/C00-1057.pdf\">Robust Segmentation of Japanese Text into a Lattice for Parsing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of COLING 2000.<\/li>\n<li>Hisami Suzuki, Chris Brockett and Gary Kacmarcik. 2000. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C\/C00\/C00-2119.pdf\">Using a Broad-Coverage Parser for Word-Breaking in Japanese<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of COLING 2000.<\/li>\n<\/ul>\n<h2>Grammar Checker<\/h2>\n<ul>\n<li>George E. Heidorn. 2000. Intelligent writing assistance. In <strong>A <\/strong><strong>Handbook<\/strong><strong> of Natural Language Processing<\/strong>: Techniques and Applications for the Processing of Language as Text. Marcel Dekker, New York. pp. 181-207.<\/li>\n<\/ul>\n<h2>Machine Translation<\/h2>\n<ul>\n<li>Michael Gamon, Hisami Suzuki, and Simon Corston-Oliver. 2001. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/using-machine-learning-for-system-internal-evaluation-of-transferred-linguistic-representations\/\">Using Machine Learning for System-Internal Evaluation of Transferred Linguistic Representations<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, European Association for Machine Translation, January 2001<\/li>\n<li>Simon Corston-Oliver, Michael Gamon, and Chris Brockett. 2001. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/a-machine-learning-approach-to-the-automatic-evaluation-of-machine-translation\/\">A Machine Learning Approach to the Automatic Evaluation of Machine Translation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Association for Computational Linguistics<\/li>\n<li>Arul Menezes and Stephen D. Richardson. 2001. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/a-best-first-alignment-algorithm-for-automatic-extraction-of-transfer-mappings-from-bilingual-corpora\/\">A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Association for Computational Linguistics<\/li>\n<li>William Dolan, Stephen D. Richardson, Arul Menezes, and Monica Corston-Oliver. 2001. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/overcoming-the-customization-bottleneck-using-example-based-mt\/\">Overcoming the customization bottleneck using example-based MT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Association for Computational Linguistics<\/li>\n<li>Stephen D. Richardson, William B. Dolan, Arul Menezes, and Jessie Pinkham. 2001. Achieving commercial quality translation with example-based methods. In Proceedings of MT Summit VIII, Santiago De Compostela, Spain. 293-298.<\/li>\n<li>Richard Campbell, Carmen Lozano, Jessie Pinkham, and Martine Smets. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W\/W02\/W02-1504.pdf\">Machine Translation as a Testbed for Multilingual Analysis<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of COLING 2002<\/li>\n<li>Jessie Pinkham and Martine Smets. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C\/C02\/C02-1160.pdf\">Modular MT with a learned bilingual dictionary: rapid deployment of a new language pair<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of COLING 2002<\/li>\n<li>Jessie Pinkham and Martine Smets. 2002. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/machine-translation-without-a-bilingual-dictionary\/\">Machine Translation without a Bilingual Dictionary<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of The 9th Conference on Theoretical and Methodological Issues in Machine Translation.<\/li>\n<li>Chris Brockett, Takako Aikawa, Anthony Aue, Arul Menezes, Chris Quirk, and Hisami Suzuki. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W02-1604\">English-Japanese Example-Based Machine Translation Using Abstract Semantic Representations<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Proceedings of Coling 2002 workshop on Machine Translation in Asia, at COLING-2002<\/li>\n<li>Martine Smets, Joseph Penteroudakis, and Arul Menezes. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/ofai.at\/~brigitte.krenn\/colloc02\/smetsEA.pdf\">Translation of Verbal Idioms<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.In Proceedings of the International Workshop on Computational Approaches to Collocations, Colloc-02, Vienna, Austria<\/li>\n<li>Arul Menezes. 2002. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/download.springer.com\/static\/pdf\/204\/bok%253A978-3-540-45820-3.pdf?auth66=1426263164_38f6132ee39b0ad1e89c392812cd2f98&ext=.pdf\">Better contextual translation using machine learning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 8 \u2013 12, 2002 Proceedings, Springer, Verlag<\/li>\n<li>Martine Smets, Michael Gamon, Jessie Pinkham, Tom Reutter, and Martine Pettanaro. 2003. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/high-quality-machine-translation-using-a-machine-learned-sentence-realization-component\/\">High quality machine translation using a machine-learned sentence realization component<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Association for Machine Translation in the Americas<\/li>\n<li>Simon Corston-Oliver and Michael Gamon. 2003. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/combining-decision-trees-and-transformation-based-learning-to-correct-transferred-linguistic-representations\/\">Combining decision trees and transformation-based learning to correct transferred linguistic representations<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Association for Machine Translation in the Americas<\/li>\n<li>Anthony Aue, Arul Menezes, Robert Moore, Chris Quirk, and Eric Ringger. 2004. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/statistical-machine-translation-using-labeled-semantic-dependency-graphs\/\">Statistical Machine Translation Using Labeled Semantic Dependency Graphs<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI\u20102004). Baltimore, Maryland.<\/li>\n<li>Simon Corston-Oliver and Michael Gamon. 2004. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/normalizing-german-and-english-inflectional-morphology-to-improve-statistical-word-alignment\/\">Normalizing German and English Inflectional Morphology to Improve Statistical Word Alignment<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Association for Machine Translation in the Americas<\/li>\n<li>Chris Quirk, Arul Menezes, and Colin Cherry. 2004. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dependency-tree-translation-syntactically-informed-phrasal-smt\/\">Dependency Tree Translation: Syntactically Informed Phrasal SMT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, no. MSR-TR-2004-113, November 2004<\/li>\n<li>Eric Ringger, Michael Gamon, Robert C. Moore, David Rojas, Martine Smets and Simon Corston-Oliver. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.mt-archive.info\/Coling-2004-Ringger.pdf\">Linguistically informed statistical models of constituent structure for ordering in sentence realization<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 20<sup>th<\/sup> Internation Conference on Computational Linguistics.<\/li>\n<li>Donghui Feng, Yajuan L\u00fc, Ming Zhou. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W04-3248\">A New Approach for English-Chinese Named Entity Alignment<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of EMNLP-2004<\/li>\n<li>Yajuan L\u00fc and Ming Zhou. 2004. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/collocation-translation-acquisition-using-monolingual-corpora\/\">Collocation Translation Acquisition Using Monolingual Corpora<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 42<sup>nd<\/sup> Annual Meeting on Association for Computational Linguistics.<\/li>\n<li>Arul Menezes and Chris Quirk. 2005.<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/microsoft-research-treelet-translation-system-iwslt-evaluation\/\">Microsoft Research Treelet Translation System: IWSLT Evaluation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the International Workshop on Spoken Language Translation, October 2005<\/li>\n<li>Chris Quirk, Arul Menezes, and Colin Cherry. 2005. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dependency-treelet-translation-syntactically-informed-phrasal-smt\/\">Dependency Treelet Translation: Syntactically Informed Phrasal SMT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of ACL<\/li>\n<li>Arul Menezes and Chris Quirk. 2005. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dependency-treelet-translation-the-convergence-of-statistical-and-example-based-machine-translation-2\/\">Dependency treelet translation: the convergence of statistical and example-based machine-translation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 10th Machine Translation Summit Workshop on Example-Based Machine Translation<\/li>\n<li>Michael Gamon, Anthony Aue, and Martine Smets. 2005. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.mt-archive.info\/EAMT-2005-Gamon.pdf\">Sentence-level MT evaluation without reference translations: Beyond language modeling<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the European Association for Machine Translation.<\/li>\n<li>Chris Quirk and Simon Corston-Oliver. 2006. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/the-impact-of-parse-quality-on-syntactically-informed-statistical-machine-translation\/\">The impact of parse quality on syntactically-informed statistical machine translation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of EMNLP 2006<\/li>\n<li>Xiaodong He, Arul Menezes, Chris Quirk, Anthony Aue, Simon Corston-Oliver, Jianfeng Gao, and Patrick Nguyen. 2006. Microsoft Research Treelet Translation System: NIST MT Evaluation 06, National Institute of Standards and Technology , March 2006<\/li>\n<li>Chris Quirk and Arul Menezes. 2006. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dependency-treelet-translation-the-convergence-of-statistical-and-example-based-machine-translation\/\">Dependency Treelet Translation: The convergence of statistical and example-based machine translation?<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Machine Translation, vol. 20, pp. 43\u201365, March 2006<\/li>\n<li>Arul Menezes, Kristina Toutanova, and Chris Quirk. 2006. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/microsoft-research-treelet-translation-system-naacl-2006-europarl-evaluation\/\">Microsoft research treelet translation system: NAACL 2006 Europarl evaluation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In WMT 2006<\/li>\n<li>Arul Menezes and Chris Quirk. 2007. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/using-dependency-order-templates-to-improve-generality-in-translation\/\">Using Dependency Order Templates to Improve Generality in Translation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Second Workshop on Statistical Machine Translation at ACL 2007<\/li>\n<li>Arul Menezes and Chris Quirk. 2008. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/syntactic-models-for-structural-word-insertion-and-deletion-during-translation\/\">Syntactic Models for Structural Word Insertion and Deletion during Translation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of EMNLP 2008<\/li>\n<\/ul>\n<h2>Summarization<\/h2>\n<ul>\n<li>Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamundi, Hisami Suzuki, and Lucy Vanderwende. 2007. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/the-pythy-summarization-system-microsoft-research-at-duc-2007\/\">The Pythy Summarization System: Microsoft Research at DUC 2007<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of DUC-20077<\/li>\n<li>Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/beyond-sumbasic-task-focused-summarization-with-sentence-simplification-and-lexical-expansion\/\">Beyond SumBasic: Task-Focused Summarization with Sentence Simplification and Lexical Expansion<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Information Processing and Management, Volume 43 , Issue 6, pages 1606-1618<\/li>\n<li>Eduard Hovy, Chin-Yew Lin, and Liang Zhou. 2005. Evaluating DUC 2005 using Basic Elements. In Proceedings of the DUC-2005 workshop.<\/li>\n<li>Jurij Leskovec, Natasa Milic-Frayling, Marko Grobelnik. 2005. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/cs.stanford.edu\/people\/jure\/pubs\/nlpspo-aaai05.pdf\">Impact of Linguistic Analysis on the Semantic Graph Coverage and Learning of Document Extracts<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the National Conference on Artificial Intelligence (AAAI), 2005.<\/li>\n<li>Simon H. Corston-Oliver, Eric Ringger, Michael Gamon, and Richard Campbell. 2004. Task-focused summarization of email. In Proceedings of the ACL 2004 Workshop \u201cText Summarization Branches Out\u201d, Barcelona, Spain.<\/li>\n<li>Lucy Vanderwende, Michele Banko, and Arul Menezes. 2004. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/event-centric-summary-generation\/\">Event-centric summary generation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Working notes of the Document Understanding Conference 2004<\/li>\n<li>Jurij Leskovec, Natasa Milic-Frayling, Marko Grobelnik. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/nlpwin\/\">Extracting Summary Sentences Based on the Document Semantic Graph<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Microsoft Research Technical Report MSR-TR-2005-07, 2005.<\/li>\n<li>Jurij Leskovec, Marko Grobelnik, and Natasa Milic-Frayling. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/cs.stanford.edu\/people\/jure\/pubs\/nlpspo-linkkdd04.pdf\">Learning Sub-structures of Document Semantic Graphs for Document Summarization<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Workshop on Link Analysis and Group Detection (LinkKDD), 2004<\/li>\n<li>Simon Corston-Oliver. 2001. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/text-compaction-for-display-on-very-small-screens\/\">Text compaction for display on very small screens<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Workshop on Automatic Summarization, NAACL 2001.<\/li>\n<\/ul>\n<h2>Evaluation<\/h2>\n<ul>\n<li>Eric Ringger, Robert C. Moore, Eugene Charniak, Lucy Vanderwende, and Hisami Suzuki. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2004\/pdf\/226.pdf\">Using the Penn Treebank to Evaluate Non-Treebank Parsers<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Fourth International Conference on Language Resources and Evaluation (LREC&#8217;04)<\/li>\n<\/ul>\n<h2>Entailment<\/h2>\n<ul>\n<li>Lucy Vanderwende, Arul Menezes, and Rion Snow. 2006. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/microsoft-research-at-rte-2-syntactic-contributions-in-the-entailment-task-an-implementation\/\">Microsoft Research at RTE-2: Syntactic Contributions in the Entailment Task: an implementation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Second PASCAL Recognising Textual Entailment Challenge Workshop, 2006<\/li>\n<li>Rion Snow, Lucy Vanderwende, and Arul Menezes. 2006. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/aclweb.org\/anthology\/N\/N06\/N06-1005.pdf\">Effectively using syntax for recognizing false entailment<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL)<\/li>\n<\/ul>\n<h2>Knowledge Base Construction \/ Information Extraction \/ Text mining<\/h2>\n<ul>\n<li>A Kumaran, Ranbeer Makin, Vijay Pattisapu, Shaik Sharif, Gary Kacmarcik, and Lucy Vanderwende. 2006. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/nlpwin\/\">Automatic Extraction of Synonymy Information<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In <em>the Ontologies in Text Technology Workshop, Osnabruck, Germany<\/em>, December 2006<\/li>\n<li>Chris Quirk, Pallavi Choudhury, Michael Gamon, and Lucy Vanderwende. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/W11-1825.pdf\">MSR-NLP entry in BioNLP Shared Task 2011<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the BioNLP Shared Task 2011 Workshop.<\/li>\n<\/ul>\n<h2>Spelling Correction<\/h2>\n<ul>\n<li>Andi Wu, George Heidorn, Zixin Jiang, and Terence Peng. 2001. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/correction-of-erroneous-characters-in-chinese-sentence-analysis\/\">Correction of Erroneous Characters in Chinese Sentence<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In International Conference on Chinese Computing.<\/li>\n<\/ul>\n<h2>Information Retrieval<\/h2>\n<ul>\n<li>Simon H. Corston-Oliver and William B. Dolan. 1999. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/P99-1045\">Less is more: Eliminating index terms from subordinate clauses<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics<\/li>\n<li>Natasa Milic-Frayling, Ralph Sommerer. 2001.<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/ms-read-context-sensitive-document-analysis-in-the-www-environment\/\"><em>MS Read: Context Sensitive Document Analysis in the WWW Environment<\/em><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,. SIGIR &#8217;01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, NY, USA, 2001.<\/li>\n<li>Jianfeng Gao and Jian-Yun Nie, 2006. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/nlpwin\/\">Study of Statistical Models for Query Translation: Finding a Good Unit of Translation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In SIGIR.<\/li>\n<li>Ingrid Zukerman and Eric Horvitz. 2001. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/P\/P01\/P01-1070.pdf\">Using Machine Learning Techniques to Interpret WH-questions<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the 39<sup>th<\/sup> Annual Meeting of the Association for Computational Linguistics, ACL-01.<\/li>\n<\/ul>\n<h2>Intelligent Agents<\/h2>\n<ul>\n<li>Simon Corston-Oliver, Eric Ringger, Michael Gamon, and Richard Campbell. 2004.<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/ceas.cc\/2004\/134.pdf\">Integration of Email and Task Lists<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In First Conference on Email and Anti-Spam (CEAS), 2004 Proceedings<\/li>\n<li>Tim Paek and Eric Horvitz. 1999. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aaai.org\/Papers\/Symposia\/Fall\/1999\/FS-99-03\/FS99-03-013.pdf\">Uncertainty, Utility, and Misunderstanding: A Decision-theoretic Perspective on Grounding in Conversational Systems<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In AAAI Technical Report FS-99-03.<\/li>\n<li>Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, and Qiang Yang. 2006. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&arnumber=4053131\">Adding Semantics to Email Clustering<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Sixth International Conference on Data Mining (ICDM\u201906).<\/li>\n<li>Gokhan Tur, Anoop Deoras, and Dilek Hakkani-Tur. 2014.\u00a0Detecting Out-Of-Domain Utterances Addressed to a Virtual Personal Assistant. In <em>Proceedings of Interspeech<\/em>, ISCA &#8211; International Speech Communication Association, September 2014.<\/li>\n<\/ul>\n<h2>Application to Education<\/h2>\n<ul>\n<li>Lee Schwartz, Takako Aikawa, and Michel Pahud. 2004. Dynamic Language Learning Tools. Proceedings of the 2004 InSTIL\/ICALL Symposium, June 2004.<\/li>\n<li>Takako Aikawa, Lee Schwartz, and Michel Pahud. 2005 .<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/nlpwin\/\">NLP Story Maker<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of the Second Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics April 21-23, 2005, Pozna\u0144, Poland<\/li>\n<\/ul>\n<h2>Sentiment<\/h2>\n<ul>\n<li>Michael Gamon. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C\/C04\/C04-1121.pdf\">Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceeding of COLING-04, the 20th International Conference on Computational Linguistics<\/li>\n<\/ul>\n<h2>Authorship identification<\/h2>\n<ul>\n<li>Michael Gamon. 2004. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.aclweb.org\/anthology\/C04-1088\">Linguistic correlates of style: authorship classification with deep linguistic analysis features<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In Proceedings of COLING-2004<\/li>\n<\/ul>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/li>\n\t\t<li class=\"m-0\">\n\t\t<h4 class=\"accordion-header\">\n\t\t\t<button\n\t\t\t\tclass=\"btn btn-collapse m-0\"\n\t\t\t\ttype=\"button\"\n\t\t\t\tdata-mount=\"collapse\"\n\t\t\t\tdata-target=\"#collapse-3\"\n\t\t\t\taria-expanded=\"false\"\n\t\t\t\taria-controls=\"collapse-3\"\n\t\t\t>\n\t\t\t\tTeam Members\t\t\t<\/button>\n\t\t<\/h4>\n\t\t<div class=\"collapse\" id=\"collapse-3\">\n\t\t\t<div class=\"accordion-body\">\n\t\t\t\t\t\t\t\t<ul>\n<li><strong>English & core development: <\/strong>Karen Jensen, George Heidorn, Stephen D. Richardson, Diana Peterson, Lucy Vanderwende, Joseph Pentheroudakis, Bill Dolan, Deborah Coughlin, Lee Schwartz, Simon Corston Oliver, Eric Ringger, Rich Campbell, Arul Menezes, Chris Quirk<\/li>\n<li><strong>French<\/strong>: Martine Pettenaro, Jessie Pinkham, Martine Smets<\/li>\n<li><strong>Spanish<\/strong>: Marisa Jimenez, Carmen Lozano, Maite Melero<\/li>\n<li><strong>German<\/strong>: Michael Gamon, Tom Reutter<\/li>\n<li><strong>Japanese<\/strong>: Takako Aikawa, Chris Brockett, Hisami Suzuki<\/li>\n<li><strong>Chinese<\/strong>: Terrence Peng, Andi Wu, Jiang Zixin<\/li>\n<li><strong>Korean<\/strong>: Jee Eun Kim, Kong Joo Lee<\/li>\n<\/ul>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/li>\n\t\t\t\t\t\t<\/ul>\n\t<\/div>\n\t\n","protected":false},"excerpt":{"rendered":"<p>An\u00a0introduction by Lucy Vanderwende* * on behalf of everyone who contributed to the development of NLPwin NLPwin is a software project at Microsoft Research that aims to provide Natural Language Processing tools for Windows (hence, NLPwin). The project was started in 1991, just as Microsoft inaugurated the Microsoft Research group; while active development of NLPwin [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13545],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-171415","msr-project","type-msr-project","status-publish","hentry","msr-research-area-human-language-technologies","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2014-10-03","related-publications":[168080],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171415","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171415\/revisions"}],"predecessor-version":[{"id":604254,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171415\/revisions\/604254"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=171415"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=171415"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=171415"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=171415"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=171415"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}