Downloads
Generative Neural Visual Artist (GeNeVA) – Datasets – Generation Code
May 2019
Scripts to generate the CoDraw and i-CLEVR datasets used for the GeNeVA Neural Visual Artist (GeNeVA) task proposed in Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction.
MS MARCO
May 2019
MS MARCO is a collection of datasets focused on deep learning in search. The first dataset was a question answering dataset featuring 100,000 real Bing questions and a human generated answer. Since then we released a 1,000,000 question dataset, a…
FigureQA Dataset
March 2018
Answering questions about a given image is a difficult task, requiring both an understanding of the image and the accompanying query. Microsoft Montreal’s FigureQA dataset introduces a new visual reasoning task for research, specific to graphical plots and figures. The…
Frames Dataset
March 2018
Frames is a dataset designed to encourage research towards conversational agents which can support decision-making in complex settings, in this case – booking a vacation including flights and a hotel. More than just searching a database, we believe the next…
NewsQA Dataset
March 2018
The purpose of Microsoft Montreal’s NewsQA dataset is to help the research community build algorithms that are capable of answering questions requiring human-level comprehension and reasoning skills. Leveraging CNN articles from the DeepMind Q&A Dataset, we prepared a crowd-sourced machine…
nlg-eval
January 2018
nlg-eval Evaluation code for various unsupervised automated metrics for NLG (Natural Language Generation). It takes as input a hypothesis file, and one or more references files and outputs values of metrics. Rows across these files should correspond to the same…