WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data


  • Otávio D. A. Alcântara Federal University of Minas Gerais
  • Alvaro R. Pereira Jr. Federal University of Minas Gerais
  • Humberto M. Almeida Universidade Federal de Minas Gerais
  • Marcos A. Gonçalves Universidade Federal de Minas Gerais
  • Christian Middleton Universitat Pompeu Fabra
  • Ricardo Baeza-Yates Dept. of Computer Science Universidad de Chile




Benchmark, Clickthrough, Learning to Rank


In this paper we present WCL2R, a benchmark collection  for supporting
research in learning to rank (L2R) algorithms which exploit clickthrough
features.  Differently from other L2R benchmark collections, such as LETOR
and the recently released Yahoo!'s collection for a L2R competition, in
WCL2R we focus on defining a significant (and new) set of features over
clickthrough data extracted from the logs of a real-world search engine.
In this paper, we describe the WCL2R collection by providing details about
how the corpora, queries and relevance judgments were obtained, how the
learning features were constructed  and how the process of splitting the
collection in folds for representative learning was performed. We also analyze the
discriminative power of the WCL2R collection using traditional feature
selection algorithms and show that the most discriminative features are, in fact, those
based on clickthrough data. We then compare several L2R algorithms on
WCL2R, showing that all of them obtain significant gains by exploiting
clickthrough information over using traditional ranking approaches.


Download data is not yet available.




How to Cite

Alcântara, O. D. A., Pereira Jr., A. R., Almeida, H. M., Gonçalves, M. A., Middleton, C., & Baeza-Yates, R. (2010). WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data. Journal of Information and Data Management, 1(3), 551. https://doi.org/10.5753/jidm.2010.1294



Regular Papers