Downloads
Microsoft KaggleDBQA Dataset: Realistic Evaluation of Text-to-SQL Parsers
July 2021
Microsoft KaggleDBQA is a cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. It also provides database documentation, which contain rich in-domain knowledge. The nature of obscure and abbreviated column/table names…
Multi-Task Deep Neural Networks for Natural Language Understanding
March 2019
This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding.
Search-based Neural Structured Learning for Sequential Question Answering
August 2018
This project contains the source code of the Dynamic Neural Semantic Parser (DynSP), based on DyNet, described in the paper paper “Search-based Neural Structured Learning for Sequential Question Answering”.
MSR Abstractive Text Compression Dataset
January 2017
This dataset contains sentences and short paragraphs with corresponding shorter (compressed) versions. There are up to five compressions for each input text, together with quality judgements of their meaning preservation and grammaticality. The dataset is derived using source texts from…
Visual Question Generation dataset
October 2016
We introduce this dataset in order to support the novel task of Visual Question Generation (VQG), where, given an image, the system should ‘ask a natural and engaging question’. This dataset can be used to support research on common sense…
If-This-Then-That Programs and Descriptions Corpus
July 2015
This download primarily contains a list of URLs with paired natural language descriptions and code, as well as a separate of those URLs into training, development, and test data. In addition, code is included to help the downloader retrieve those…
Smart Selection Dataset
April 2014
Smart selection is the task of predicting the span of text that a user intended to select after they touched on a single word on a touch-enabled device. The Smart Selection Dataset consists of crowd-sourced smart selection annotations on publicly…
Powergrading Short Answer Grading Corpus
October 2013
This corpus contains the original data analyzed in the following paper: Basu, Jacobs, and Vanderwende, “Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading,” Transactions of the ACL, 2013. It consists of responses from 100 + 698…
BioNLP Shared Task 2011 Dev-Set Results
May 2011
The Natural Language Processing group at Microsoft Research is sharing the development-set results for the BioNLP Shared Task 2011 to help analysis of whether there is an upper bound on recall and precision in the trigger-detection process. It is interesting…