下载
Microsoft KaggleDBQA Dataset: Realistic Evaluation of Text-to-SQL Parsers
2021年7月
Microsoft KaggleDBQA is a cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. It also provides database documentation, which contain rich in-domain knowledge. The nature of obscure and abbreviated column/table names…
Multi-Task Deep Neural Networks for Natural Language Understanding
2019年3月
This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding.
Search-based Neural Structured Learning for Sequential Question Answering
2018年8月
This project contains the source code of the Dynamic Neural Semantic Parser (DynSP), based on DyNet, described in the paper paper “Search-based Neural Structured Learning for Sequential Question Answering”.
MSR Abstractive Text Compression Dataset
2017年1月
This dataset contains sentences and short paragraphs with corresponding shorter (compressed) versions. There are up to five compressions for each input text, together with quality judgements of their meaning preservation and grammaticality. The dataset is derived using source texts from…
Visual Question Generation dataset
2016年10月
We introduce this dataset in order to support the novel task of Visual Question Generation (VQG), where, given an image, the system should ‘ask a natural and engaging question’. This dataset can be used to support research on common sense…
If-This-Then-That Programs and Descriptions Corpus
2015年7月
This download primarily contains a list of URLs with paired natural language descriptions and code, as well as a separate of those URLs into training, development, and test data. In addition, code is included to help the downloader retrieve those…
Smart Selection Dataset
2014年4月
Smart selection is the task of predicting the span of text that a user intended to select after they touched on a single word on a touch-enabled device. The Smart Selection Dataset consists of crowd-sourced smart selection annotations on publicly…
Powergrading Short Answer Grading Corpus
2013年10月
This corpus contains the original data analyzed in the following paper: Basu, Jacobs, and Vanderwende, “Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading,” Transactions of the ACL, 2013. It consists of responses from 100 + 698…
BioNLP Shared Task 2011 Dev-Set Results
2011年5月
The Natural Language Processing group at Microsoft Research is sharing the development-set results for the BioNLP Shared Task 2011 to help analysis of whether there is an upper bound on recall and precision in the trigger-detection process. It is interesting…