A Fast and Scalable Manycore Implementation for an On-Demand Learning to Rank Method
Resumo
Learning to rank (L2R) works by constructing a ranking model from training data so that, given unseen data (query), a somewhat similar ranking is produced. Almost all work in L2R focuses on ranking accuracy leaving performance and scalability overlooked. In this work we present a fast and scalable manycore (GPU) implementation for an on-demand L2R technique that builds ranking models on the fly. Our experiments show that we are able to process a query (build a model and rank) in only a few milliseconds, achieving a speedup of 508x over a serial baseline and 4x over a parallel baseline for the best case. We extend the implementation to work with multiple GPUs, further increasing the speedup over the parallel baseline to approximately x16 when using 4 GPUs.Referências
Agrawal, R., Imieliínski, T., and Swami, A. (1993). Mining associ ation rules between sets of items in large databases. ACM SIGMOD Record.
Atreas, N. and Karanikas, C. (2007). A fast pattern matching algorithm based on prime numbers and hashing approximation. NATO Security through Science Series -D: Information and Communication Security.
Buckles, B. P. and Lybanon, M. (1977). Algorithm 515: Generation of a vector from the lexicographical index [g6]. TOMS.
Chapelle, O., Chang, Y., and Liu, T.-Y. (2011). Future directions in learning to rank. In Yahoo! Learning to Rank Challenge.
De Sousa, D. X., Rosa, T. C., Martins, W. S., Silva, R., and Gonçalves, M. A. (2012). Improving on-demand learning to rank through parallelism. In WISE.
Duh, K. and Kirchhoff, K. (2008). Learning to rank with partially-labeled data. In SIGIR. ACM.
Geng, X., Liu, T.-Y., Qin, T., Arnold, A., Li, H., and Shum, H.-Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR. ACM.
Harris, M. et al. (2007). Optimizing parallel reduction in cuda.
Hastie, T., Tibshirani, R., and Friedman, R. (2009). The Elements of NVIDIA Developer Technology. Statistical Learning.
Henneges, C., Hinselmann, G., Jung, S., Madlung, J., Schütz, W., Nordheim, A., and Zell, A. (2011). Ranking methods for the prediction of frequent top scoring peptides from proteomics data. Journal of Proteomics & Bioinformatics. [Jin et al. 2015] Jin, J., Cai, X., Lai, G., and Lin, X. (2015). Gpu-accelerated parallel algorithms for linear ranksvm. The Journal of Supercomputing.
Liu, T.-Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval.
Luitjens, J. (2013). Cuda pro tip: Increase performance with vectorized https://devblogs.nvidia.com/parallelforall/memory access. cuda-pro-tip-increase-performance-with-vectorizedmemory-access/, acessed in june 2016.
Lv, Y., Moon, T., Kolari, P., Zheng, Z., Wang, X., and Chang, Y. (2011). Learning to model relatedness for news recommendation. In WWW. ACM.
McCaffrey, J. (2004). Generating the mth Lexicographical Element of a Mathematical Combination. http://msdn.microsoft.com/en-us/ library/aa289166(v=vs.71).aspx, acessed in june 2016.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pager ank citation ranking: bringing order to the web.
Qin, T., Liu, T.-Y., Xu, J., and Li, H. (2010). Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval. [Shukla et al. 2012] Shukla, S., Lease, M., and Tewari, A. (2012). Parallelizing listnet training using spark. In SIGIR. ACM.
Silva, R., Gonçalves, M. A., and Veloso, A. (2011). Rule-based active sampling for learning to rank. In ECML PKDD. Springer.
Upadhyaya, S. R. (2013). Parallel approaches to machine learning—a comprehensive survey. JPDC.
Veloso, A. A., Almeida, H. M., Gonçalves, M. A., and Meira Jr, W. (2008). Learning to rank at query-time using association rules. In SIGIR. ACM.
Wang, S., Wu, Y., Gao, B. J., Wang, K., Lauw, H. W., and Ma, J. (2015). A cooperative coevolution framework for parallel learning to rank. TKDE. [Yue et al. 2007] Yue, Y., Finley, T., Radlinski, F., and Joachims, T. (2007). A support vector method for optimizing average precision. In SIGIR. ACM.
Atreas, N. and Karanikas, C. (2007). A fast pattern matching algorithm based on prime numbers and hashing approximation. NATO Security through Science Series -D: Information and Communication Security.
Buckles, B. P. and Lybanon, M. (1977). Algorithm 515: Generation of a vector from the lexicographical index [g6]. TOMS.
Chapelle, O., Chang, Y., and Liu, T.-Y. (2011). Future directions in learning to rank. In Yahoo! Learning to Rank Challenge.
De Sousa, D. X., Rosa, T. C., Martins, W. S., Silva, R., and Gonçalves, M. A. (2012). Improving on-demand learning to rank through parallelism. In WISE.
Duh, K. and Kirchhoff, K. (2008). Learning to rank with partially-labeled data. In SIGIR. ACM.
Geng, X., Liu, T.-Y., Qin, T., Arnold, A., Li, H., and Shum, H.-Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR. ACM.
Harris, M. et al. (2007). Optimizing parallel reduction in cuda.
Hastie, T., Tibshirani, R., and Friedman, R. (2009). The Elements of NVIDIA Developer Technology. Statistical Learning.
Henneges, C., Hinselmann, G., Jung, S., Madlung, J., Schütz, W., Nordheim, A., and Zell, A. (2011). Ranking methods for the prediction of frequent top scoring peptides from proteomics data. Journal of Proteomics & Bioinformatics. [Jin et al. 2015] Jin, J., Cai, X., Lai, G., and Lin, X. (2015). Gpu-accelerated parallel algorithms for linear ranksvm. The Journal of Supercomputing.
Liu, T.-Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval.
Luitjens, J. (2013). Cuda pro tip: Increase performance with vectorized https://devblogs.nvidia.com/parallelforall/memory access. cuda-pro-tip-increase-performance-with-vectorizedmemory-access/, acessed in june 2016.
Lv, Y., Moon, T., Kolari, P., Zheng, Z., Wang, X., and Chang, Y. (2011). Learning to model relatedness for news recommendation. In WWW. ACM.
McCaffrey, J. (2004). Generating the mth Lexicographical Element of a Mathematical Combination. http://msdn.microsoft.com/en-us/ library/aa289166(v=vs.71).aspx, acessed in june 2016.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pager ank citation ranking: bringing order to the web.
Qin, T., Liu, T.-Y., Xu, J., and Li, H. (2010). Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval. [Shukla et al. 2012] Shukla, S., Lease, M., and Tewari, A. (2012). Parallelizing listnet training using spark. In SIGIR. ACM.
Silva, R., Gonçalves, M. A., and Veloso, A. (2011). Rule-based active sampling for learning to rank. In ECML PKDD. Springer.
Upadhyaya, S. R. (2013). Parallel approaches to machine learning—a comprehensive survey. JPDC.
Veloso, A. A., Almeida, H. M., Gonçalves, M. A., and Meira Jr, W. (2008). Learning to rank at query-time using association rules. In SIGIR. ACM.
Wang, S., Wu, Y., Gao, B. J., Wang, K., Lauw, H. W., and Ma, J. (2015). A cooperative coevolution framework for parallel learning to rank. TKDE. [Yue et al. 2007] Yue, Y., Finley, T., Radlinski, F., and Joachims, T. (2007). A support vector method for optimizing average precision. In SIGIR. ACM.
Publicado
05/10/2016
Como Citar
F. E FREITAS, Mateus; DE SOUSA, Daniel; MARTINS, Wellington; ROSA, Thierson; SILVA, Rodrigo; GONÇALVES, Marcos.
A Fast and Scalable Manycore Implementation for an On-Demand Learning to Rank Method. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 17. , 2016, Aracajú.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2016
.
p. 133-144.
DOI: https://doi.org/10.5753/wscad.2016.14254.