Efficient and Effective Link Analysis with Precomputed SALSA Maps
- Marc Najork ,
- Nick Craswell
17th ACM Conference on Information and Knowledge Management (CIKM) |
Published by Association for Computing Machinery, Inc.
SALSA is a link-based ranking algorithm that takes the result set of a query as input, extends the set to include additional neighboring documents in the web graph, and performs a random walk on the induced subgraph. The stationary probability distribution of this random walk, used as a relevance score, is significantly more effective for ranking purposes than popular query-independent link-based ranking algorithms such as PageRank. nfortunately, this requires significant effort at query-time, to access the link graph and compute the stationary probability distribution. In this paper, we explore whether it is possible to perform most of the computation off-line, prior to the arrival of any queries. The off-line phase of our approach computes a “score map” for each node in the web graph by performing a SALSA-like algorithm on the neighborhood of that node and retaining the scores of the most promising nodes in the neighborhood graph. The on-line phase takes the results to a query, retrieves the score map of each result, and returns for each result a score that is the sum of the matching scores from each score map. We evaluated this algorithm on a collection of about 28,000 queries with partially labeled results, and found that it is significantly more effective than PageRank, although not quite as effective as SALSA. We also studied the trade-off between ranking effectiveness and space requirements.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.