Towards the Use of Machine Learning Algorithms to Enhance the Effectiveness of Search Strings in Secondary Studies

Leonardo Cairo; Glauco de F. Carneiro; Miguel P. Monteiro; Fernando Brito e Abreu

Leonardo Cairo UNIFACS
Glauco de F. Carneiro UNIFACS
Miguel P. Monteiro UNL
Fernando Brito e Abreu

Resumo

Devising an appropriate Search String for a secondary study is not a trivial task and identifying suitable keywords has been reported in the literature as a difficulty faced by researchers. A poorly chosen Search String may compromise the quality of the secondary study, by missing relevant studies or leading to overwork in subsequent steps of the secondary study, in case irrelevant studies are selected. In this paper, we propose an approach for the creation and calibration of a Search String. We chose three published systematic literature reviews (SLRs) from Scopus and applied Machine Learning algorithms to create the corresponding Search Strings to be used in the SLRs. Comparison of results obtained with those published in previous SLRs, show an increase of recall of revisions by up to 12%, with no loss of recall. To motivate future studies and replications, the tool implementing the proposed approach is available in a public repository, along with the dataset used in this paper.

Palavras-chave: secondary studies, machine learning, natural language processing

Referências

Diego Buchinger, Gustavo Andriolli De Siqueira Cavalcanti, and Marcelo Da Silva Hounsell. 2014. Mecanismos de busca acadêmica: uma análise quantitativa. Revista Brasileira de Computação Aplicada (2014).

Marko Gasparic and Andrea Janes. 2016. What recommendation systems for software engineering recommend: A systematic literature review. Journal of Systems and Software 113 (2016), 101--113. https://doi.org/10.1016/j.jss.2015.11.036

David Guthrie, Ben Allison, Wei Liu, Louise Guthrie, and Yorick Wilks. 2006. A closer look at skip-gram modelling.. In LREC. 1222--1225.

Barbara Kitchenham, David Budgen, and O. Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews.

B Kitchenham and S Charters. 2007. Guidelines for performing systematic literature reviews in software engineering. Guidelines for Performing Systematic Literature Reviews in Software Engineering (2007).

Eero Laukkanen, Juha Itkonen, and Casper Lassenius. 2017. Problems, causes and solutions when adopting continuous deliveryâĂŤA systematic literature review. Information and Software Technology 82 (2017), 55--79. https://doi.org/10.1016/j. infsof.2016.10.001

Rasmus Ros, Elizabeth Bjarnason, and Per Runeson. 2017. A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering - EASE'17. ACM, 118--127. https: //doi.org/10.1145/3084226.3084243

Eva Maria Schön, Jörg Thomaschewski, and María José Escalona. 2017. Agile Requirements Engineering: A systematic literature review. Computer Standards and Interfaces 49 (2017), 79--91. https://doi.org/10.1016/j.csi.2016.08.011

He Zhang, Muhammad Ali Babar, and Paolo Tell. 2011. Identifying relevant studies in software engineering. Information and Software Technology 53, 6 (2011), 625--637. https: //doi.org/10.1016/j.infsof.2010.12.010 arXiv:gr-qc/0208024

Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 1, 1-4 (2010), 43--52.