Towards the Use of Machine Learning Algorithms to Enhance the Effectiveness of Search Strings in Secondary Studies

  • Leonardo Cairo UNIFACS
  • Glauco de F. Carneiro UNIFACS
  • Miguel P. Monteiro UNL
  • Fernando Brito e Abreu


Devising an appropriate Search String for a secondary study is not a trivial task and identifying suitable keywords has been reported in the literature as a difficulty faced by researchers. A poorly chosen Search String may compromise the quality of the secondary study, by missing relevant studies or leading to overwork in subsequent steps of the secondary study, in case irrelevant studies are selected. In this paper, we propose an approach for the creation and calibration of a Search String. We chose three published systematic literature reviews (SLRs) from Scopus and applied Machine Learning algorithms to create the corresponding Search Strings to be used in the SLRs. Comparison of results obtained with those published in previous SLRs, show an increase of recall of revisions by up to 12%, with no loss of recall. To motivate future studies and replications, the tool implementing the proposed approach is available in a public repository, along with the dataset used in this paper.

Palavras-chave: secondary studies, machine learning, natural language processing


Diego Buchinger, Gustavo Andriolli De Siqueira Cavalcanti, and Marcelo Da Silva Hounsell. 2014. Mecanismos de busca acadêmica: uma análise quantitativa. Revista Brasileira de Computação Aplicada (2014).

Marko Gasparic and Andrea Janes. 2016. What recommendation systems for software engineering recommend: A systematic literature review. Journal of Systems and Software 113 (2016), 101--113.

David Guthrie, Ben Allison, Wei Liu, Louise Guthrie, and Yorick Wilks. 2006. A closer look at skip-gram modelling.. In LREC. 1222--1225.

Barbara Kitchenham, David Budgen, and O. Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews.

B Kitchenham and S Charters. 2007. Guidelines for performing systematic literature reviews in software engineering. Guidelines for Performing Systematic Literature Reviews in Software Engineering (2007).

Eero Laukkanen, Juha Itkonen, and Casper Lassenius. 2017. Problems, causes and solutions when adopting continuous deliveryâĂŤA systematic literature review. Information and Software Technology 82 (2017), 55--79. infsof.2016.10.001

Rasmus Ros, Elizabeth Bjarnason, and Per Runeson. 2017. A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering - EASE'17. ACM, 118--127. https: //

Eva Maria Schön, Jörg Thomaschewski, and María José Escalona. 2017. Agile Requirements Engineering: A systematic literature review. Computer Standards and Interfaces 49 (2017), 79--91.

He Zhang, Muhammad Ali Babar, and Paolo Tell. 2011. Identifying relevant studies in software engineering. Information and Software Technology 53, 6 (2011), 625--637. https: // arXiv:gr-qc/0208024

Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 1, 1-4 (2010), 43--52.
Como Citar

Selecione um Formato
CAIRO, Leonardo; CARNEIRO, Glauco de F.; MONTEIRO, Miguel P.; BRITO E ABREU, Fernando. Towards the Use of Machine Learning Algorithms to Enhance the Effectiveness of Search Strings in Secondary Studies. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 33. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 .