下载
Microsoft KaggleDBQA Dataset: Realistic Evaluation of Text-to-SQL Parsers
2021年7月
Microsoft KaggleDBQA is a cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. It also provides database documentation, which contain rich in-domain knowledge. The nature of obscure and abbreviated column/table names…
Visual Question Generation dataset
2016年10月
We introduce this dataset in order to support the novel task of Visual Question Generation (VQG), where, given an image, the system should ‘ask a natural and engaging question’. This dataset can be used to support research on common sense…
If-This-Then-That Programs and Descriptions Corpus
2015年7月
This download primarily contains a list of URLs with paired natural language descriptions and code, as well as a separate of those URLs into training, development, and test data. In addition, code is included to help the downloader retrieve those…
Multi-Task Deep Neural Networks for Natural Language Understanding
2019年3月
This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding.
Smart Selection Dataset
2014年4月
Smart selection is the task of predicting the span of text that a user intended to select after they touched on a single word on a touch-enabled device. The Smart Selection Dataset consists of crowd-sourced smart selection annotations on publicly…
Microsoft Research Paraphrase Corpus
2005年3月
This download consists of data only: a text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. No more than…
Microsoft Research Asia Chinese Word-Segmentation Data Set
2007年8月
A set of manually annotated Chinese word-segmentation data and specifications for training and testing a Chinese word-segmentation system for research purposes. The data was extracted from the People’s Daily, which we have licensed for commercial usage, and the annotation was…
BioNLP Shared Task 2011 Dev-Set Results
2011年5月
The Natural Language Processing group at Microsoft Research is sharing the development-set results for the BioNLP Shared Task 2011 to help analysis of whether there is an upper bound on recall and precision in the trigger-detection process. It is interesting…
MSR Abstractive Text Compression Dataset
2017年1月
This dataset contains sentences and short paragraphs with corresponding shorter (compressed) versions. There are up to five compressions for each input text, together with quality judgements of their meaning preservation and grammaticality. The dataset is derived using source texts from…