Return to Microsoft Research Lab – Redmond

Natural Language Processing Group

Downloads

Microsoft KaggleDBQA Dataset: Realistic Evaluation of Text-to-SQL Parsers

July 2021

Microsoft KaggleDBQA is a cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. It also provides database documentation, which contain rich in-domain knowledge. The nature of obscure and abbreviated column/table names…

Download

Multi-Task Deep Neural Networks for Natural Language Understanding

March 2019

This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding.

Github

Search-based Neural Structured Learning for Sequential Question Answering

August 2018

This project contains the source code of the Dynamic Neural Semantic Parser (DynSP), based on DyNet, described in the paper paper “Search-based Neural Structured Learning for Sequential Question Answering”.

Github

MSR Abstractive Text Compression Dataset

January 2017

This dataset contains sentences and short paragraphs with corresponding shorter (compressed) versions. There are up to five compressions for each input text, together with quality judgements of their meaning preservation and grammaticality. The dataset is derived using source texts from…

Download

Visual Question Generation dataset

October 2016

We introduce this dataset in order to support the novel task of Visual Question Generation (VQG), where, given an image, the system should ‘ask a natural and engaging question’. This dataset can be used to support research on common sense…

Download

If-This-Then-That Programs and Descriptions Corpus

July 2015

This download primarily contains a list of URLs with paired natural language descriptions and code, as well as a separate of those URLs into training, development, and test data. In addition, code is included to help the downloader retrieve those…

Download

Smart Selection Dataset

April 2014

Smart selection is the task of predicting the span of text that a user intended to select after they touched on a single word on a touch-enabled device. The Smart Selection Dataset consists of crowd-sourced smart selection annotations on publicly…

Download

Powergrading Short Answer Grading Corpus

October 2013

This corpus contains the original data analyzed in the following paper: Basu, Jacobs, and Vanderwende, “Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading,” Transactions of the ACL, 2013. It consists of responses from 100 + 698…

Download

BioNLP Shared Task 2011 Dev-Set Results

May 2011

The Natural Language Processing group at Microsoft Research is sharing the development-set results for the BioNLP Shared Task 2011 to help analysis of whether there is an upper bound on recall and precision in the trigger-detection process. It is interesting…

Download