Downloads
Presentation at datafest, May 2019 in Moscow
May 2019
The presentation starts with a brief introduction of Reinforcement Learning (RL) and an overview of its success. Even though these achievements are compelling, state-of-the-art algorithms require an unreasonable amount of data. Moreover, they sometimes converge to terrible solutions. These restrictions…
TextWorld
July 2019
TextWorld is a text-based framework used to generate games used to train artificial intelligent agents for text adventure games. The goal is to have this project be used to advance the state of the art of AI research and to…
TREC Deep Learning Track
April 2024
The TREC Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not…
MS MARCO
May 2019
MS MARCO is a collection of datasets focused on deep learning in search. The first dataset was a question answering dataset featuring 100,000 real Bing questions and a human generated answer. Since then we released a 1,000,000 question dataset, a…
FigureQA Dataset
March 2018
Answering questions about a given image is a difficult task, requiring both an understanding of the image and the accompanying query. Microsoft Montreal’s FigureQA dataset introduces a new visual reasoning task for research, specific to graphical plots and figures. The…
Generative Neural Visual Artist (GeNeVA) – Training and Evaluation Code
September 2019
Code to train and evaluate the GeNeVA-GAN model for the GeNeVA task proposed in our ICCV 2019 paper Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction.
MetaLWOz: A Dataset of Multi-Domain Dialogues for the Fast Adaptation of Conversation Models
July 2019
We introduce the Meta-Learning Wizard of Oz (MetaLWOz) dialogue dataset for developing fast adaptation methods for conversation models. This data can be used to train task-oriented dialogue models, specifically to develop methods to quickly simulate user responses with a small…
Python Reasoning Challenges
May 2021
A short Python Reasoning Challenge can replace an entire page of English describing a typical programming problem. The goal is to teach computers how to program. This OSS repository will contain a dataset of short Python challenges. Most of them…
Protein sequence models
November 2021
Codebase for generative modeling of protein sequence and structure, including code for CNNs and GNNs and custom data handling code.