Return to Microsoft Research Lab – Redmond

Natural Language Processing Group

下载

Microsoft KaggleDBQA Dataset: Realistic Evaluation of Text-to-SQL Parsers

2021年7月

Microsoft KaggleDBQA is a cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. It also provides database documentation, which contain rich in-domain knowledge. The nature of obscure and abbreviated column/table names…

Download

Multi-Task Deep Neural Networks for Natural Language Understanding

2019年3月

This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding.

Github

Search-based Neural Structured Learning for Sequential Question Answering

2018年8月

This project contains the source code of the Dynamic Neural Semantic Parser (DynSP), based on DyNet, described in the paper paper “Search-based Neural Structured Learning for Sequential Question Answering”.

Github

MSR Abstractive Text Compression Dataset

2017年1月

This dataset contains sentences and short paragraphs with corresponding shorter (compressed) versions. There are up to five compressions for each input text, together with quality judgements of their meaning preservation and grammaticality. The dataset is derived using source texts from…

Download

Visual Question Generation dataset

2016年10月

We introduce this dataset in order to support the novel task of Visual Question Generation (VQG), where, given an image, the system should ‘ask a natural and engaging question’. This dataset can be used to support research on common sense…

Download

If-This-Then-That Programs and Descriptions Corpus

2015年7月

This download primarily contains a list of URLs with paired natural language descriptions and code, as well as a separate of those URLs into training, development, and test data. In addition, code is included to help the downloader retrieve those…

Download

Smart Selection Dataset

2014年4月

Smart selection is the task of predicting the span of text that a user intended to select after they touched on a single word on a touch-enabled device. The Smart Selection Dataset consists of crowd-sourced smart selection annotations on publicly…

Download

Powergrading Short Answer Grading Corpus

2013年10月

This corpus contains the original data analyzed in the following paper: Basu, Jacobs, and Vanderwende, “Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading,” Transactions of the ACL, 2013. It consists of responses from 100 + 698…

Download

BioNLP Shared Task 2011 Dev-Set Results

2011年5月

The Natural Language Processing group at Microsoft Research is sharing the development-set results for the BioNLP Shared Task 2011 to help analysis of whether there is an upper bound on recall and precision in the trigger-detection process. It is interesting…

Download