Comparing LLMs and Proposing an ML-Based Approach for Search String Generation in Systematic Literature Reviews

  • Diogo Adário Marassi PUC-Rio
  • Juliana Alves Pereira PUC-Rio
  • Katia Romero Felizardo UTFPR

Resumo


The formulation of an effective search string is a critical process in systematic literature reviews (SLRs), as it directly influences both the coverage and precision of the retrieved studies. Traditionally, this process relies on manual keyword selection and expert-driven refinements, making it laborious, susceptible to human bias, and often inaccessible to non-specialists. To address these limitations, this study explores the application of artificial intelligence (AI) to support the generation of search strings.We organized our ongoing investigation into two main phases. In the first phase, we evaluated the performance of search strings generated by different large language models (LLMs), specifically Llama-8B, Gemma-12B, and Mistral-Nemo-12B, using a previously published SLR as a benchmark. Our results suggest that, while LLMs can assist in search string formulation, their effectiveness is inconsistent and sensitive to input conditions. Motivated by these limitations, we propose a semi-automated pipeline based on Machine Learning (ML). Through our preliminary analysis, we proposed a standardized reproducible evaluation framework to assess and compare AI-based search string generation strategies, including our proposed ML-based approach.
Palavras-chave: Systematic Literature Review, Automation, Search String

Referências

Nauman bin Ali and Binish Tanveer. 2022. A comparison of citation sources for reference and citation-based search in systematic literature reviews. e-Informatica Software Engineering Journal 16, 1 (2022).

Ahmad Alshami, Moustafa Elsayed, Eslam Ali, Abdelrahman E. E. Eltoukhy, and Tarek Zayed. 2023. Harnessing the power of ChatGPT for automating systematic review process: methodology, case Study, limitations, and future directions. Systems 11, 7 (2023), 1–7.

Ahmed Al-Zubidy and Jeffrey C. Carver. 2019. Identification and prioritization of SLR search tool requirements: an SLR and a survey. Empirical Software Engineering 24, 1 (2019), 139–169.

Marwa Assim, Qasem Obeidat, and Mustafa Hammad. 2020. Software Defects Prediction using Machine Learning Algorithms. In 2020 International Conference on Data Analytics for Business and Industry (ICDABI). IEEE, 1–6.

Prashant Bansal. 2024. Prompt Engineering Importance and Applicability with Generative AI. Journal of Computer and Communications 12 (2024), 14–23.

Jeffrey C. Carver, Edgar Hassler, Elis Hernandes, and Nicholas A. Kraft. 2013. Identifying barriers to the systematic literature review process. In 7𝑡ℎ International Symposium on Empirical Software Engineering and Measurement (ESEM’13). IEEE, 203–213.

Oscar Dieste and Anna Griman Padua. 2007. Developing search strategies for detecting relevant experiments for systematic reviews. In 1st Symposium on Empirical Software Engineering and Measurement (ESEM’07). ACM, 215–224.

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press, 226–231.

Luyi Feng, Yin Kia Chiam, Erma Rahayu Mohd Faiza, and Unaizah Obaidellah. 2017. Using suffix tree clustering method to support the planning phase of systematic literature review. Malaysian journal of Computer Science 4, 30 (2017), 311–332.

Mohammad Ghafari, Mortaza Saleh, and Touraj Ebrahimi. 2012. A federated search approach to facilitate systematic literature review in software engineering. International journal of Software Engineering & Applications 2, 3 (2012), 1–13.

Andreas Hinderks, Francisco José Domínguez Mayo, Jörg Thomaschewski, and María José Escalona. 2020. An SLR-tool: Search process in practice: A tool to conduct and manage systematic literature review (SLR). In 42𝑛𝑑 Conference on Software Engineering: Companion Proceedings (ICSE-Companion - ICSE’20). IEEE Press, 81–84.

Abiodun M. Ikotun, Absalom E. Ezugwu, Laith Abualigah, Belal Abuhaija, and Jia Heming. 2023. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences 622 (2023), 178–210. DOI: 10.1016/j.ins.2022.11.139

Jailma Januário, Maria Isabel Nicolau, Katia Romero Felizardo, and Juliana Alves Pereira. 2025. Toward Reliable Forward Snowballing in Systematic Literature Reviews: A Comparative Study and Framework Proposal. In Brazilian Symposium on Software Engineering, Insightful Ideas and Emerging Results Track (SBES IIER). SOL, 1–7.

Barbara Kitchenham and Stuart Charters. 2007. Guidelines for performing systematic literature reviews in software engineering. Technical Report. EBSE Technical Report, EBSE-200701, Keele University and University of Durham, Staffordshire, UK and Durham, UK. [link]

Diogo Adário Marassi, Juliana Alves Pereira, and Katia Romero Felizardo. 2025. Comparing LLMs and Proposing an ML-Based Approach for Search String Generation in Systematic Literature Reviews. [link]. Accessed: 2025-07-15.

Germano Duarte Mergel, Milene Selbach Silveira, and Tiago Silva da Silva. 2015. A method to support search string building in systematic literature reviews through visual text mining. In 30𝑡ℎ Annual ACM Symposium on Applied Computing (SAC’15). ACM DL, 1594–1600.

Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar. 2024. Nomic Embed: Training a reproducible long context text embedder. Technical Report. Nomic AI.

Juliana Alves Pereira, Mathieu Acher, Hugo Martin, Jean-Marc Jézéquel, Goetz Botterweck, and Anthony Ventresque. 2021. Learning software configuration spaces: A systematic literature review. Journal of Systems and Software 182 (2021), 111044.

Jason Portenoy and Jevin D. West. 2020. Constructing and evaluating automated literature review systems. Scientometrics 125, 3 (2020), 3233–3251.

Heri Ramampiaro, Daniela Cruzes, Reidar Conradi, and Manoel Mendonça. 2010. Supporting evidence-based Software Engineering with collaborative information retrieval. In 6th Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom’10). IEEE Press, 1–5.

Juan Ramos. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. 133–142.

Rasmus Ros, Elizabeth Bjarnason, and Per Runeson. 2017. A machine learning approach for semi-automated search and selection in literature studies. In 21st Conference on Evaluation and Assessment in Software Engineering (EASE’17). ACM DL, 1–10.

Rasmus Ros, Elizabeth Bjarnason, and Per Runeson. 2017. A machine learning approach for semi-automated search and selection in literature studies. In 21st International Conference on Evaluation and Assessment in Software Engineering (EASE’17). ACM DL, 118–127.

Atrisha Sarkar, Jianmei Guo, Norbert Siegmund, Sven Apel, and Krzysztof Czarnecki. 2015. Cost-Efficient Sampling for Performance Prediction of Configurable Systems. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 342–352. DOI: 10.1109/ASE.2015.45

Norbert Siegmund, Marko Rosenmüller, Martin Kuhlemann, Christian Kästner, Sven Apel, and Gunter Saake. 2012. SPL Conqueror: Toward optimization of non-functional properties in software product lines. Software Quality Journal 20, 3–4 (2012), 487–517. DOI: 10.1007/s11219-011-9152-9

Norbert Siegmund, Marko Rosenmüller, Martin Kuhlemann, Christian Kästner, and Gunter Saake. 2010. Measuring Non-functional Properties in Software Product Lines for Product Derivation. In Proceedings of the 14th International Software Product Line Conference (SPLC). IEEE.

Paramvir Singh and Karanpreet Singh. 2017. Exploring automatic search in digital libraries: A caution guide for systematic reviewers. In 21st Conference on Evaluation and Assessment in Software Engineering (EASE’17). ACM DL, 236–241.

Francisco Carlos Monteiro Souza, Alinne Cristinne Corrêa dos Santos, Stevão Alves de Andrade, Rafael Serapilha Durelli, Vinícius Humberto Serapilha Durelli, and Rafael Alves Paes de Oliveira. 2017. Automating search strings for secondary studies. In Information Technology – New Generations, Shahram Latifi (Ed.). Advances in Intelligent Systems and Computing, Vol. 558. Springer, Cham, 839–848.

Mariusz Sośnicki and Leszek Madeyski. 2021. ASH: A new tool for automated and full-text search in systematic literature reviews. In 21st Conference Computational Science (ICCS’21). Springer, 362—-369.

Yueming Sun, Ye Yang, He Zhang, Wen Zhang, and Qing Wang. 2012. Towards evidence-based ontology for supporting Systematic Literature Review. In 16th Conference on Evaluation Assessment in Software Engineering (EASE’12). IEEE Press, 171–175.

Paul Temple, José Angel Galindo, Mathieu Acher, and Jean-Marc Jézéquel. 2016. Using Machine Learning to Infer Constraints for Product Lines. In Proceedings of the 20th International Software Product Line Conference (SPLC). 209–218. DOI: 10.1145/2934466.2934472

Marco Valenzuela, Vu Ha, and Oren Etzioni. 2015. Identifying Meaningful Citations. In Scholarly Big Data: AI Perspectives, Challenges, and Ideas: Papers from the 2015 AAAI Workshop. Association for the Advancement of Artificial Intelligence, Austin, Texas, USA.

Raymon van Dinter, Bedir Tekinerdogan, and Cagatay Catal. 2021. Automation of systematic literature reviews: A systematic literature review. Information and Software Technology 136, C (2021), 16 pages.

Shuai Wang, Harrisen Scells, Bevan Koopman, and Guido Zuccon. 2023. Can ChatGPT write a good boolean query for systematic review literature search?. In 46𝑡ℎ International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’23). ACM, 1426—-1436.

Claes Wohlin. 2014. Writing for synthesis of evidence in empirical software engineering. In 8𝑡ℎ International Symposium on Empirical Software Engineering and Measurement (ESEM’14). ACM, 1–4.
Publicado
22/09/2025
MARASSI, Diogo Adário; PEREIRA, Juliana Alves; FELIZARDO, Katia Romero. Comparing LLMs and Proposing an ML-Based Approach for Search String Generation in Systematic Literature Reviews. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 39. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 852-858. ISSN 2833-0633. DOI: https://doi.org/10.5753/sbes.2025.11612.