Fractional Similarity: Cross-lingual Feature Selection for Search

  • Jagadeesh Jagarlamudi ,
  • Paul Bennett

Proceedings of the 33rd Annual European Conference on Information Retrieval (ECIR 2011) |

Training data as well as supplementary data such as usage-based click behavior may abound in one search market (i.e., a particular region, domain, or language) and be much scarcer in another market. Transfer methods attempt to improve performance in these resource-scarce markets by leveraging data across markets. However, differences in feature distributions across markets can change the optimal model.We introduce a method called Fractional Similarity, which uses query-based variance within a market to obtain more reliable estimates of feature deviations across markets. An empirical analysis demonstrates that using this scoring method as a feature selection criterion in cross-lingual transfer improves relevance ranking in the foreign language and compares favorably to a baseline based on KL divergence.