(opens in new tab)<\/span><\/a>). Even in the case where sources are equally trustworthy, what should happen to (apparent) contradictions? Weights computed for specific pieces of the knowledge graph can be used to balance how frequently that information is encountered, but the source itself should also be considered in the weight scheme. Moreover, MindNet is not simply a database of triples; we preserve the context from which the semantic relations were extracted, and so in theory, we could resolve apparent contradictions by taking context into account. We did not encounter these concerns as MindNet has only been computed from sources that are categorically true (dictionaries and encyclopedias), but these concerns should be addressed going forward with knowledge acquisition from the web.<\/p>\nThe original intent, as shown in Figure 2, was to reduce paraphrases to a canonical representation in a module that we tentatively named \u201cConcepts\u201d, though “Concept Detection” would have been more descriptive. As with Word Sense Disambiguation, we abandoned this module as we were dissatisfied with the underlying assumption that one representation of a concept or complex event would be primary over others, while in reality, both expressions are equivalent; equivalence should be fluid and allow to vary depending on the need of the application. Here again, we believe that the current research which aims to represent parse fragments in vector space is a promising approach, while emphasizing that it is essential to take the parse and logical form structure into account.<\/p>\n
Finally, a few words about the generation grammar (shown on the right hand side of the rainbow in Figure 1). In NLPwin, we developed two types of generation grammars: rule-based generation components (including those that shipped with Microsoft Word to enable the re-write of passive to active, e.g.) and Amalgam, a set of machine-learned generation modules. Both types of generation grammars were used in production for Machine Translation.<\/p>\n
In Summary …<\/h2>\n We\u2019ve described some of the aspects of the NLPwin project at Microsoft Research[5]. The lexical and syntactic processing components are designed to be broad-coverage and robust to grammatical errors, allowing for parses to be constructed for fragmented, ungrammatical as well as grammatical inputs. These components are largely rule-based grammars, making use of rich lexical and semantic resources derived from online dictionaries. The output of the parsing component, a tree analysis, is converted to a graph-based representation called Logical Form. The goal of Logical Form is to compute the predicate-argument structure for each clause and to normalize differing syntactic realizations of what can be considered the same \u201cmeaning\u201d. In so doing, the distance between concepts reflects the semantic distance and no longer the linear distance in the surface realization, bringing related concepts closer together than they might appear at the surface. MindNet is the automatic construction of the database of connected Logical Forms. When reference resources are the source text for MindNet, MindNet can be viewed as a traditional Knowledge Acquisition method and object, but when MindNet is constructed by processing arbitrary text input, MindNet represents a global representation of all the Logical Forms of that text which allows the browsing of the concepts and their semantic connections in that text. In fact, MindNet was considered most compelling as a means for browsing and exploring specific relations mined from a text collection.<\/p>\n
[1] see Jensen, Karen. 1987. Binary rules and non-binary trees: Breaking down the concept of phrase structure. In Mathematics of language<\/i>, ed. A. Manaster-Ramer, 65-86. Amsterdam: John Benjamins Pub.Co.<\/p>\n
[2] In fact, the NLPwin system has not (yet) addressed this issue till today.<\/p>\n
[3] The LDOCE box codes, for instance, provide information on type restrictions and the arguments for verbs. In LDOCE, \u201cpersuade\u201d is marked ObjC, indicating, that \u201cpersuade\u201d has Object Control (i.e. that the object of \u201cpersuade\u201d is understood to be the subject of the verb complement). Thus, it is possible to construct a Logical Form with \u201cJohn\u201d as the subject of \u201cgo to the library\u201d from the input sentence: \u201cI persuaded John to go to the library\u201d, while for the input sentence \u201cI promised John to go to the library\u201d, the Logical Form is constructed with \u201cI\u201d as the subject of \u201cgo to the library\u201d.<\/p>\n
[4] The algorithm of course also identifies the relation \u201celephant HYPERNYM animal\u201d, but, in dictionary processing, the information extracted from the differentiae of the definition (the specifications on the hypernym), are true of the word being defined rather than true of the hypernym, and so we do not extract that \u201canimals have tusks\u201d but rather that \u201celephants have tusks\u201d.<\/p>\n
[5] At the time of this writing (2014) NLPwin is considered a mature system, with only limited development of the generation and logical form components.<\/p>\n\t\t\t
\n\t\t\t\t\t\t\t
\n\t\t\t\t\t
\n\t\t\t\t\t\t
Expand all<\/button>\n\t\t\t\t\t\t | <\/span>\n\t\t\t\t\t\tCollapse all<\/button>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t