Polysemy in a Broad-Coverage Natural Language Processing System

Published by Oxford University Press

MS-NLP is a broad-coverage natural language understanding system that has been under development in Microsoft Research since 1991. Perhaps the most notable characteristic of this effort has been its emphasis on arbitrarily broad coverage of natural language phenomena. The system’s goal is to produce a useful linguistic analysis of any piece of text passed to it, regardless of whether that text is formal business prose, casual email, or technical writing from an obscure scientific domain. This emphasis on handling any sort of input has had interesting implications for the design of morphological and syntactic processing. Equally interesting, though, are its implications for semantic processing. The issue of polysemy and the attendant practical task of word sense disambiguation (WSD) take on entirely new dimensions in the context of a system like this, where a word might have innumerable possible meanings. A starting assumption, for example, is that MS-NLP will routinely have to interpret words and technical word senses that are not described in standard reference dictionaries.