AskMSR: Question Answering Using the Worldwide Web
- Michele Banko ,
- Eric Brill ,
- Susan Dumais ,
- Jimmy Lin
MSR-TR-2016-30 |
AAAI Spring Symposium on Mining Answers from Text and Knowledge Bases
The design of the AskMSR question answering system is motivated by recent observations in natural language processing that for many applications, significant improvements in accuracy can be attained simply by increasing the amount of data used for learning (e.g., Banko & Brill, 2001). By taking advantage of the vast amount of online text available via the worldwide web, rather than relying on an approach that depends heavily on natural language intensive techniques, we developed a simple but effective question answering system. Many groups working on question answering use a variety of linguistic resources – part-of-speech tagging, parsing, named entity extraction, WordNet, etc, We chose instead to focus on the tremendous resource that the web provides simply as a gigantic data repository. The web, which is home to billions of pages of electronic text, is orders of magnitude larger than the TREC QA document collection, which consists of fewer than 1 million documents.
Recently, other researchers have also looked to the web as a resource for question answering. These systems typically perform complex parsing and entity extraction for both queries and best matching web pages (Kwok et al., 2001, Buchholtz, 2001), which limits the number of web pages that they can analyze in detail. Other systems require term weighting for selecting or ranking the best-matching passages (Clarke et al., 2001, Kwok et al., 2001), and this requires auxiliary data structures. Our approach is distinguished from these in its simplicity and efficiency in the use of web resources.
An early version of the AskMSR system attained the sixth best accuracy in lenient scoring in the TREC 2001 Question Answering Track (Brill, et al. 2001; Voorhees and Harman, 2001).