skip to main content
10.1145/3229345.3229370acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbsiConference Proceedingsconference-collections
research-article

AVS: An approach to identifying and mitigating duplicate bug reports

Authors Info & Claims
Published:04 June 2018Publication History

ABSTRACT

In general, software enterprises adopting Error Reporting Management Systems during the production/testing process. The types of information and a large amount of data stored in these systems leads to challenges related to the efficiency of error tracking, such as the presence of duplicate bug reports that hinder productivity. Ideally, a tester should identify a duplicate error report before creating it. In this work, we propose the AVS (Automatic Versatile Search tool), that contributes to the identification of duplicate errors based on Information Retrieval and Text Mining techniques. As proof of concept, we implemented the AVS in the context of the Motorola Test Center (MTC) at the Informatics Center of UFPE. Every search by a new error report candidate is preprocessed. Then, the calculation of similarity between the new report and those available in the database generates a ranked list of similarity. In the end, the results are clustering to produce a more advanced process of identifying duplicate potentials. Experiments carried out on a corpus of about 750,000 reports have revealed the tool's usefulness in identifying duplicate error reports.1

References

  1. F. de Castro Netto, M. de Oliveira Barros, and F. A. Baião, "Mineração da Base de Dados de Defeitos de Software," UNIVERSIDADE FEDERAL DO ESTADO DO RIO DE JANEIRO, 2009.Google ScholarGoogle Scholar
  2. A. Hindle, "Stopping duplicate bug reports before they start with Continuous Querying for bug reports." 2016.Google ScholarGoogle Scholar
  3. D. Swapna and K. Thammi Reddy, "A Study of Information Retrieval Approaches in Duplicate Bug Detection," Indian J. Sci. Technol., vol. 9, no. 43, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Zhang, X. Wang, D. Hao, B. Xie, L. Zhang, and H. Mei, "A survey on bug-report analysis," Sci. China Inf. Sci., vol. 58, no. 2, pp. 1--24, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  5. X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, "An approach to detecting duplicate bug reports using natural language and execution information," in Software Engineering, 2008. ICSE'08. ACM/IEEE 30th International Conference on, 2008, pp. 461--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Hotho, A. Nürnberger, and G. Paaß, "A brief survey of text mining," LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. C. Aggarwal and C. Zhai, Mining Text Data. Springer Science & Business Media, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  9. K. Aggarwal, F. Timbers, T. Rutgers, A. Hindle, E. Stroulia, and R. Greiner, "Detecting duplicate bug reports with software engineering domain knowledge: Detecting Duplicate Bug Reports with Software-Engineering Domain Knowledge," J Softw Evol Proc, vol. 29, no. 3, p. e1821, Mar. 2017.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Alipour, A. Hindle, and E. Stroulia, "A contextual approach towards more accurate duplicate bug report detection," in 2013 10th Working Conference on Mining Software Repositories (MSR), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo, and C. Sun, "Duplicate bug report detection with a combination of information retrieval and topic modeling," in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering - ASE 2012, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Jalbert and W. Weimer, "Automated duplicate detection for bug tracking systems," in 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), 2008.Google ScholarGoogle Scholar
  13. R. P. Gopalan and A. Krishna, "Duplicate Bug Report Detection Using Clustering," in 2014 23rd Australian Software Engineering Conference, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Apache, Apache Solr Reference Guide For Solr 7.1, 7.1 ed. 2017.Google ScholarGoogle Scholar
  15. Y. Tian, D. Lo, and C. Sun, "Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction," in 2012 19th Working Conference on Reverse Engineering, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. M. da Silva, E. M. da Silva, and R. R. Souza, "Comparing three different techniques to retrieve documents using multiwords expressions," in Proceedings of 10th CONTECSI International Conference on Information Systems and Technology Management, 2013.Google ScholarGoogle Scholar
  17. L. S. Gaspar, F. C. Bernardini, L. Weitzel, "Análise do uso de recuperação da informação em documentos de atendimento - Estudo de caso em bases de soluções de informática," presented at the SBSI, 2016, p. 4.Google ScholarGoogle Scholar
  18. D. Shahi, Apache Solr: A Practical Approach to Enterprise Search. Apress, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Oliveira and A. M. De, "Um método de detecção de plágio em códigos-fonte para disciplinas iniciais de programação," Dec. 2016.Google ScholarGoogle Scholar
  20. N. M. S. Amador, "Um Meta-Motor de Pesquisa para a Web Portuguesa," Universidade Técnica de Lisboa, 2009.Google ScholarGoogle Scholar
  21. K. M. Cassiano, "Análise de Séries Temporais Usando Análise Espectral Singular (SSA) e Clusterização de Suas Componentes Baseada em Densidade," Pontifícia Universidade Católica do Rio de Janeiro, 2014.Google ScholarGoogle Scholar
  22. S. Osinski and D. Weiss, "A concept-driven algorithm for clustering search results," IEEE Intell. Syst., vol. 20, no. 3, pp. 48--54, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. S. Bordin and Others, "Framework baseado em conhecimento para análise de rede de colaboração científica," 2015.Google ScholarGoogle Scholar
  24. S. Osiński, J. Stefanowski, and D. Weiss, "Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition," in Intelligent Information Processing and Web Mining, 2004, pp. 359--368.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. AVS: An approach to identifying and mitigating duplicate bug reports

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          SBSI '18: Proceedings of the XIV Brazilian Symposium on Information Systems
          June 2018
          578 pages
          ISBN:9781450365598
          DOI:10.1145/3229345

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 June 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate181of557submissions,32%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader