Downloads
ORCAS: Open Resource for Click Analysis in Search
April 2024
ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries.
TREC Deep Learning Track
April 2024
The TREC Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not…
Tip of the Tongue Known Item Retrieval Dataset for Movie Identification
August 2021
The Tip of the Tongue (ToT) dataset is from the paper Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification. It is comprised of 758 question/answer pairs scraped from the website iRememberThisMovie.com between 2013 and 2018. These…
Conformer-Kernel Model with Query Term Independence (TREC Deep Learning Quick Start)
March 2021
This is a quick start guide for the document ranking task in the TREC Deep Learning (TREC-DL) benchmark. If you are new to TREC-DL, then this repository may make it more convenient for you to download all the required datasets…
MS MARCO
May 2019
MS MARCO is a collection of datasets focused on deep learning in search. The first dataset was a question answering dataset featuring 100,000 real Bing questions and a human generated answer. Since then we released a 1,000,000 question dataset, a…
IR metrics for R
April 2018
This is a small library for implementing several standard "test collection" or "offline" evaluation measures for search systems. See: https://github.com/Microsoft/irmetrics-r
Dual Word Embeddings Trained on Bing Queries
February 2016
This data is being released for research purposes only. The DESM Word Embeddings dataset may include terms that some may consider offensive, indecent or otherwise objectionable. Microsoft has not reviewed or modified the content of the dataset. Microsoft is providing…
People
Nick Craswell
Principal Architect
Bhaskar Mitra
Principal Researcher
Paul Thomas
Senior applied scientist
Milad Shokouhi
Principal Applied Scientist