The Threshold Algorithm: From Middleware Systems to the Relational Engine
The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the Threshold Algorithm, or TA, is the most well known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent over middleware systems, but also over plain relational data. In this work we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indexes, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems.
Copyright © 2007 IEEE. Reprinted from IEEE Computer Society. This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.