New Brazilian Database for Recommendation Systems of Scientific Papers

  • João Vitor Felipe dos Santos Federal Institute of Goiás (IFG)
  • Ricardo Marçal de Andrade Nascimento Federal Institute of Goiás (IFG) http://orcid.org/0009-0003-7604-515X
  • Adriano César de Melo Camargo Federal Institute of Goiás (IFG)
  • Sergio Daniel Carvalho Canuto Federal Institute of Goiás (IFG)
  • Gustavo de Assis Costa Federal Institute of Goiás (IFG)
  • Daniel Xavier de Sousa Federal Institute of Goiás (IFG)

Abstract


This work presents a new dataset for Scientific Article Recommendation Systems (SARS). In addition to being scarce, many existing datasets in the SARS context rely solely on co-authorship relationships as a relevance criterion, overlooking the importance of explicit user evaluations. To address this issue, we propose a new dataset with explicitly defined relevance labels, comprising over 2,000 researchers, 30 areas of knowledge, and approximately 71,000 associated papers. The work includes a characterization and evaluation of the proposed dataset, alongside other widely used datasets in the literature.
Keywords: Labeled Dataset, Scientific Paper Recommendation, Information Retrieval

References

Alzoghbi, A., Arrascue Ayala, V. A., Fischer, P. M., and Lausen, G. (2015). Pubrec: Recommending Publications Based on Publicly Available Meta-data. 2015 CEUR Workshop Proceedings, 1458:11–18.

Avazpour, I., Pitakrat, T., Grunske, L., and Grundy, J. (2014). Dimensions and Metrics for Evaluating Recommendation Systems. In Robillard, M. P., Maalej, W., Walker, R. J., and Zimmermann, T., editors, Recommendation Systems in Software Engineering, pages 245–273, Berlin, Heidelberg. Springer Berlin Heidelberg.

Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X., and Xia, F. (2019). Scientific Paper Recommendation: A Survey. IEEE Access, 7:9324–9339.

Beel, J., Langer, S., Gipp, B., and Nürnberger, A. (2014). The Architecture and Datasets of Docear’s Research Paper Recommender System. D-Lib Magazine, 20(11/12).

Bulut, B., Kaya, B., and Kaya, M. (2019). A Paper Recommendation System Based on User Interest and Citations. In Proceedings of the 2019 1st International Informatics and Software Engineering Conference (UBMYK), pages 1–5.

Guo, G., Chen, B., Zhang, X., Liu, Z., Dong, Z., and He, X. (2020). Leveraging Title-Abstract Attentive Semantics for Paper Recommendation. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020), pages 67–74. AAAI Press.

Haupka, N., Culbert, J. H., Schniedermann, A., Jahn, N., and Mayr, P. (2024). Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, Pubmed and Semantic Scholar.

Kang, W.-C. and McAuley, J. (2018). Self-Attentive Sequential Recommendation. [link].

Koltun, V. and Hafner, D. (2021). The h-Index Is No Longer an Effective Correlate of Scientific Reputation. PLOS ONE, 16(6):1–16.

Kreutz, C. K. and Schenkel, R. (2022). Scientific Paper Recommendation Systems: A Literature Review of Recent Publications. International Journal on Digital Libraries, 23:335–369.

Ley, M. (2002). The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In Laender, A. H. F. and Oliveira, A. L., editors, String Processing and Information Retrieval, pages 1–10, Berlin, Heidelberg. Springer Berlin Heidelberg.

Li, Y., Wang, R., Nan, G., Li, D., and Li, M. (2021). A Personalized Paper Recommendation Method Considering Diverse User Preferences. Decision Support Systems, 146.

Li, Z. and Zou, X. (2019). A Review on Personalized Academic Paper Recommendation. Computer and Information Science, 12:33.

Liang, W., Lu, Z., Jin, Q., Xiong, Y., and Wu, M. (2015). Modeling of Research Topic Evolution Associated with Social Networks of Researchers. In Proceedings of the 2015 IEEE 12th International Conference on Ubiquitous Intelligence and Computing, 2015 IEEE 12th International Conference on Autonomic and Trusted Computing, and 2015 IEEE 15th International Conference on Scalable Computing and Communications (UIC-ATC-ScalCom), pages 1169–1174.

Lima, M., Silva, E., and da Silva, A. (2024). Um Estudo sobre o Uso de Modelos de Linguagem Abertos na Tarefa de Recomendação de Próximo Item. In Anais do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 510–522, Porto Alegre, RS, Brasil. Sociedade Brasileira de Computação.

Montazerian, M., Zanotto, E. D., and Eckert, H. (2020). Prolificacy and Visibility versus Reputation in the Hard Sciences. Scientometrics, 123(1):207–221.

Price, R., Skopec, M., Mackenzie, S., Nijhoff, C., Harrison, R., Seabrook, G., and Harris, M. (2022). A Novel Data Solution to Inform Curriculum Decolonisation: The Case of the Imperial College London Masters of Public Health. Scientometrics, 127(2):1021–1037.

Rashed, A., Grabocka, J., and Schmidt-Thieme, L. (2019). Attribute-Aware Non-Linear Co-Embeddings of Graph Features. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys ’19), pages 314–321. Association for Computing Machinery.

Said, A. and Bellogín, A. (2014). Comparative Recommender System Evaluation: Benchmarking Recommendation Frameworks. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys ’14), pages 129–136, New York, NY, USA. Association for Computing Machinery.

Sanchez-Lengeling, B., Reif, E., Pearce, A., and Wiltschko, A. B. (2021). A Gentle Introduction to Graph Neural Networks. Distill.

Sugiyama, K. and Kan, M.-Y. (2010). Scholarly Paper Recommendation via User’s Recent Research Interests. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, pages 29–38, New York, NY, USA. Association for Computing Machinery.

Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., and Jiang, P. (2019). BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. [link].

Wang, H., Chen, B., and Li, W.-J. (2013). Collaborative Topic Regression with Social Regularization for Tag Recommendation. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI).

Wang, W., Tang, T., Xia, F., Gong, Z., Chen, Z., and Liu, H. (2022). Collaborative Filtering With Network Representation Learning for Citation Recommendation. IEEE Transactions on Big Data, 8(5):1233–1246.

Wang, X., He, X., Cao, Y., Liu, M., and Chua, T.-S. (2019). KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 950–958. Association for Computing Machinery.

Wu, S., Tang, Y., Zhu, Y., Wang, L., Xie, X., and Tan, T. (2019). Session-Based Recommendation with Graph Neural Networks. [link].

Xie, Y., Sun, Y., and Bertino, E. (2021). Learning Domain Semantics and Cross-Domain Correlations for Paper Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21), pages 706–715, New York, NY, USA. Association for Computing Machinery.

Yang, Y., Cer, D., Ahmad, A., Guo, M., Law, J., Constant, N., Hernandez Abrego, G., Yuan, S., Tar, C., Sung, Y.-h., Strope, B., and Kurzweil, R. (2020). Multilingual Universal Sentence Encoder for Semantic Retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 87–94, Online. Association for Computational Linguistics.

Zhang, Y., Wang, M., Gottwalt, F., Saberi, M., and Chang, E. (2019). Ranking Scientific Articles Based on Bibliometric Networks with a Weighting Scheme. Journal of Informetrics, 13(2):616–634.
Published
2025-09-29
DOS SANTOS, João Vitor Felipe; NASCIMENTO, Ricardo Marçal de Andrade; CAMARGO, Adriano César de Melo; CANUTO, Sergio Daniel Carvalho; COSTA, Gustavo de Assis; SOUSA, Daniel Xavier de. New Brazilian Database for Recommendation Systems of Scientific Papers. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 289-302. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247079.