Large-scale Learning to Rank using Boosted Decision Trees

Krysta M. Svore; Christopher J.C. Burges; Chris J.C. Burges; Krysta M. Svore

Large-scale Learning to Rank using Boosted Decision Trees

Krysta M. Svore ,
Christopher J.C. Burges ,
Chris J.C. Burges ,
Krysta M. Svore

in Scaling Up Machine Learning: Parallel and Distributed Approaches

Published by Cambridge University Press | 1985 | Scaling Up Machine Learning: Parallel and Distributed Approaches edition

Download BibTex

The Web search ranking task has become increasingly important due to the rapid growth of the internet. With the growth of the Web and the number of Web search users, the amount of available training data for learning Web ranking models has also increased. We investigate the problem of learning to rank on a cluster using Web search data composed of 140,000 queries and approximately fourteen million URLs. For datasets much larger than this, distributed computing will become essential, due to both speed and memory constraints. We compare to a baseline algorithm that has been carefully engineered to allow training on the full dataset using a single machine, in order to evaluate the loss or gain incurred by the distributed algorithms we consider. The underlying algorithm we use is a boosted tree ranking algorithm called LambdaMART, where a split at a given vertex in each decision tree is determined by the split criterion for a particular feature. Our contributions are two-fold. First, we implement a method for improving the speed of training when the training data fits in main memory on a single machine by distributing the vertex split computations of the decision trees. The model produced is equivalent to the model produced from centralized training, but achieves faster training times. Second, we develop a training method for the case where the training data size exceeds the main memory of a single machine. Our second approach easily scales to far larger datasets, i.e., billions of examples, and is based on data distribution. Results of our methods on a real-world Web dataset indicate significant improvements in training speed.