skip to main content
10.1145/3350768.3350772acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
short-paper

Towards the Use of Machine Learning Algorithms to Enhance the Effectiveness of Search Strings in Secondary Studies

Published:23 September 2019Publication History

ABSTRACT

Devising an appropriate Search String for a secondary study is not a trivial task and identifying suitable keywords has been reported in the literature as a difficulty faced by researchers. A poorly chosen Search String may compromise the quality of the secondary study, by missing relevant studies or leading to overwork in subsequent steps of the secondary study, in case irrelevant studies are selected. In this paper, we propose an approach for the creation and calibration of a Search String. We chose three published systematic literature reviews (SLRs) from Scopus and applied Machine Learning algorithms to create the corresponding Search Strings to be used in the SLRs. Comparison of results obtained with those published in previous SLRs, show an increase of recall of revisions by up to 12%, with no loss of recall. To motivate future studies and replications, the tool implementing the proposed approach is available in a public repository, along with the dataset used in this paper.

References

  1. Diego Buchinger, Gustavo Andriolli De Siqueira Cavalcanti, and Marcelo Da Silva Hounsell. 2014. Mecanismos de busca acadêmica: uma análise quantitativa. Revista Brasileira de Computação Aplicada (2014).Google ScholarGoogle Scholar
  2. Marko Gasparic and Andrea Janes. 2016. What recommendation systems for software engineering recommend: A systematic literature review. Journal of Systems and Software 113 (2016), 101--113. https://doi.org/10.1016/j.jss.2015.11.036Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David Guthrie, Ben Allison, Wei Liu, Louise Guthrie, and Yorick Wilks. 2006. A closer look at skip-gram modelling.. In LREC. 1222--1225.Google ScholarGoogle Scholar
  4. Barbara Kitchenham, David Budgen, and O. Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews.Google ScholarGoogle Scholar
  5. B Kitchenham and S Charters. 2007. Guidelines for performing systematic literature reviews in software engineering. Guidelines for Performing Systematic Literature Reviews in Software Engineering (2007).Google ScholarGoogle Scholar
  6. Eero Laukkanen, Juha Itkonen, and Casper Lassenius. 2017. Problems, causes and solutions when adopting continuous deliveryâĂŤA systematic literature review. Information and Software Technology 82 (2017), 55--79. https://doi.org/10.1016/j. infsof.2016.10.001Google ScholarGoogle ScholarCross RefCross Ref
  7. Rasmus Ros, Elizabeth Bjarnason, and Per Runeson. 2017. A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering - EASE'17. ACM, 118--127. https: //doi.org/10.1145/3084226.3084243Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eva Maria Schön, Jörg Thomaschewski, and María José Escalona. 2017. Agile Requirements Engineering: A systematic literature review. Computer Standards and Interfaces 49 (2017), 79--91. https://doi.org/10.1016/j.csi.2016.08.011Google ScholarGoogle ScholarCross RefCross Ref
  9. He Zhang, Muhammad Ali Babar, and Paolo Tell. 2011. Identifying relevant studies in software engineering. Information and Software Technology 53, 6 (2011), 625--637. https: //doi.org/10.1016/j.infsof.2010.12.010 arXiv:gr-qc/0208024Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 1, 1-4 (2010), 43--52.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Towards the Use of Machine Learning Algorithms to Enhance the Effectiveness of Search Strings in Secondary Studies

      Recommendations

      Reviews

      Jonathan P. E. Hodgson

      The authors propose the use of text mining to enhance the creation of search strings for constructing so-called secondary studies-that is to say, survey articles. The primary measures of success in this endeavor should be recall (that is, retrieving a high proportion of relevant articles) and "workload" (that is, reducing a researcher's work while improving retrieval). The paper specifies, in detail, the complete criteria used. The authors use the Scopus document retrieval site as the repository for potential articles. The main reason for using Scopus is the existence of an application programming interface (API) to the repository. Extending the ideas to multiple repositories is left as a future work project. After retrieving a set of articles, the authors use a specially written tool to obtain a list of words that characterizes the retrieved documents. To illustrate the ideas, three specific systematic literature reviews are used as input for the system. The sought after search string should retrieve the articles in the systematic reviews while adding additional relevant articles. The algorithms used in the analysis of retrieved documents include term frequency-inverse document frequency, continuous bag-of-words (CBOW), and skip-gram models. The paper describes results for the three survey papers and presents the recommended search strings. In a welcome move, the tool for implementing the approach is available on a public website. A minor but annoying nit: numbers are used to denote references in the body of the paper, but then not used in the bibliography. The paper is clearly (if densely) written and should be of interest to researchers looking to create a secondary study.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        SBES '19: Proceedings of the XXXIII Brazilian Symposium on Software Engineering
        September 2019
        583 pages
        ISBN:9781450376518
        DOI:10.1145/3350768

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 September 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        SBES '19 Paper Acceptance Rate67of153submissions,44%Overall Acceptance Rate147of427submissions,34%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader