Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence

Proceedings of the 44th international ACM SIGIR conference on Research & development in information retrieval |

Published by ACM

Publication

The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark—and can be considered to be an efficient (but slightly less effective) alternative to other Transformer-based architectures that employ (i) large-scale pretraining (high training cost), (ii) joint encoding of query and document (high inference cost), and (iii) larger number of Transformer layers (both high training and high inference costs). Since, a variant of the TK model—called TKL—has been developed that incorporates local self-attention to efficiently process longer input sequences in the context of document ranking. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences. Furthermore, we incorporate query term independence and explicit term matching to extend the model to the full retrieval setting. We benchmark our models under the strictly blind evaluation setting of the TREC 2020 Deep Learning track and find that our proposed architecture changes lead to improved retrieval quality over TKL. Our best model also outperforms all non-neural runs (“trad”) and two-thirds of the pretrained Transformer-based runs (“nnlm”) on NDCG@10.