{"id":169461,"date":"2002-02-19T14:32:24","date_gmt":"2002-02-19T14:32:24","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/automatic-grammar-induction\/"},"modified":"2019-08-14T14:41:22","modified_gmt":"2019-08-14T21:41:22","slug":"automatic-grammar-induction","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/automatic-grammar-induction\/","title":{"rendered":"Automatic Grammar Induction"},"content":{"rendered":"
Automatic learning of speech recognition grammars from example sentences to ease the development of spoken language systems.<\/p>\n
Researcher Ye-Yi Wang wants to have more time for vacation, so he is teaching his computer to do some work for him.<\/p>\n
Wang has been working on Spoken Language Understanding for the MiPad<\/a> project since he was hired to Microsoft Research. He has developed a robust parser and the understanding grammars for several projects. “Grammar development is painful and error-prone. It is time-consuming, tedious and it requires expertise in computational linguistics. Occupied with the work to speech-enable applications, I’ve never had enough time to use up my three-week vacation these years,” says Wang.<\/p>\n According to Wang, many state-of-the-art conversational systems use semantic-based robust understanding. In this approach, computers “understand” speech by normalizing the output from a speech recognizer into a canonical representation with a robust parser. The parser does this with a handcrafted semantic grammar. While the robust parser can be written once and used many times for different tasks, the difficulty is due to the requirement that a new semantic grammar be developed for every application domain. Because of this, speech-enabled applications are mostly developed in large human language technology labs as prototype research systems.<\/p>\n “Microsoft is a platform company. It is extremely important to provide developers with easy-to-use tools for our platforms, so that speech-enabled applications and web services can become mainstream,” says Alex Acero, Wang’s manager, who is also involved in the project.<\/p>\n They focus on developing technologies for smart tools that allow an average developer to speech-enable applications or web services. This differs from the work in automatic grammar inference, which tries to learn grammars automatically from a corpus of training sentences. Most research in grammar inference has focused on toy problems, and application of such approaches on grammar structure learning for natural language has not been satisfactory for natural language understanding applications. According to Wang, the limited success is due to the complexity of the problem and the typical sparseness of the training data relative to the complexity of the target grammar. There is not a good generalization mechanism to correctly cover a large variety of language constructions unseen in the training data. “Instead of ambiguous automatic grammar inference, we adopt a very practical approach by integrating multiple sources of easy-to-get information,” says Wang.<\/p>\n Several general technologies are currently pursued to take advantage of these information sources, including:<\/p>\n\n