{"id":273699,"date":"2017-03-01T14:17:56","date_gmt":"2017-03-01T22:17:56","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=273699"},"modified":"2017-11-28T22:40:04","modified_gmt":"2017-11-29T06:40:04","slug":"letor-learning-rank-information-retrieval","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/letor-learning-rank-information-retrieval\/","title":{"rendered":"LETOR: Learning to Rank for Information Retrieval"},"content":{"rendered":"
LETOR is a package of benchmark data sets for research on LEarning TO Rank, which contains standard features, relevance judgments, data partitioning, evaluation tools, and several baselines. Version 1.0 was released in April 2007. Version 2.0 was released in Dec. 2007. Version 3.0 was released in Dec. 2008. This version, 4.0, was released in July 2009. Very different from previous versions (V3.0 is an update based on V2.0 and V2.0 is an update based on V1.0), LETOR4.0 is a totally new release. It uses the Gov2 web page collection (~25M pages) and two query sets from Million Query track (opens in new tab)<\/span><\/a> of TREC 2007 and TREC 2008. We call the two query sets MQ2007 and MQ2008 for short. There are about 1700 queries in MQ2007 with labeled documents and about 800 queries in MQ2008 with labeled documents.<\/p>\n <\/p>\n<\/div>\n <\/p>\n","protected":false},"excerpt":{"rendered":" LETOR is a package of benchmark data sets for research on LEarning TO Rank, which contains standard features, relevance judgments, data partitioning, evaluation tools, and several baselines. Version 1.0 was released in April 2007. Version 2.0 was released in Dec. 2007. Version 3.0 was released in Dec. 2008. This version, 4.0, was released in July […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-273699","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2009-01-01","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[{"id":0,"name":"LETOR 4.0","content":"[accordion] [panel header=\"Datasets\"]\r\n\n
\nTao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval, Information Retrieval Journal, 2010. [pdf (opens in new tab)<\/span><\/a>]<\/li>\n
\nTao Qin and Tie-Yan Liu. Introducing LETOR 4.0 Datasets, arXiv preprint arXiv:1306.2597. [pdf (opens in new tab)<\/span><\/a>]<\/li>\n<\/ul>\nDatasets<\/h3>\r\nLETOR4.0 contains 8 datasets for four ranking settings derived from the two query sets and the Gov2 web page collection. The 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. In each fold, there are three subsets for learning: training set, validation set and testing set.\r\n