Query-independent evidence in home page finding

Published by ACM Press

\urlhttp://research.microsoft.com/users/nickcr/pubs/upstill_tois03.pdf

Hyperlink recommendation evidence, that is, evidence based on the structure of a web’s link graph, is widely exploited by commercial Web search systems. However there is little published work to support its popularity. Another form of query-independent evidence, URL-type, has been shown to be beneficial on a home page finding task.We compared the usefulness of these types of evidence on the home page finding task, combined with both content and anchor text baselines. Our experiments made use of five query sets spanning three corpora—one enterprise crawl, and the WT10g and VLC2 Web test collections.

We found that, in optimal conditions, all of the query-independent methods studied (in-degree, URL-type, and two variants of PageRank) offered a better than random improvement on a contentonly baseline. However, only URL-type offered a better than random improvement on an anchor text baseline. In realistic settings, for either baseline, only URL-type offered consistent gains. In combination with URL-type the anchor text baseline was more useful for finding popular home pages, but URL-type with content was more useful for finding randomly selected home pages. We conclude that a general home page finding system should combine evidence from document content, anchor text, and URL-type classification.