research-article

AVS: An approach to identifying and mitigating duplicate bug reports

Authors:
Ivan Santos

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil
View Profile

,
Joelson Araújo

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil
View Profile

,
Cloves Lima

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil
View Profile

,
Ricardo B. C. Prudêncio

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil
View Profile

,
Flávia Barros

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil

Universidade Federal de Pernambuco, Centro de Informática, Recife, Brasil
View Profile

SBSI '18: Proceedings of the XIV Brazilian Symposium on Information SystemsJune 2018Article No.: 22Pages 1–7https://doi.org/10.1145/3229345.3229370

Published:04 June 2018Publication History

SBSI '18: Proceedings of the XIV Brazilian Symposium on Information Systems

Pages 1–7

ABSTRACT

In general, software enterprises adopting Error Reporting Management Systems during the production/testing process. The types of information and a large amount of data stored in these systems leads to challenges related to the efficiency of error tracking, such as the presence of duplicate bug reports that hinder productivity. Ideally, a tester should identify a duplicate error report before creating it. In this work, we propose the AVS (Automatic Versatile Search tool), that contributes to the identification of duplicate errors based on Information Retrieval and Text Mining techniques. As proof of concept, we implemented the AVS in the context of the Motorola Test Center (MTC) at the Informatics Center of UFPE. Every search by a new error report candidate is preprocessed. Then, the calculation of similarity between the new report and those available in the database generates a ranked list of similarity. In the end, the results are clustering to produce a more advanced process of identifying duplicate potentials. Experiments carried out on a corpus of about 750,000 reports have revealed the tool's usefulness in identifying duplicate error reports.1

References

F. de Castro Netto, M. de Oliveira Barros, and F. A. Baião, "Mineração da Base de Dados de Defeitos de Software," UNIVERSIDADE FEDERAL DO ESTADO DO RIO DE JANEIRO, 2009.Google Scholar
A. Hindle, "Stopping duplicate bug reports before they start with Continuous Querying for bug reports." 2016.Google Scholar
D. Swapna and K. Thammi Reddy, "A Study of Information Retrieval Approaches in Duplicate Bug Detection," Indian J. Sci. Technol., vol. 9, no. 43, 2016.Google ScholarCross Ref
J. Zhang, X. Wang, D. Hao, B. Xie, L. Zhang, and H. Mei, "A survey on bug-report analysis," Sci. China Inf. Sci., vol. 58, no. 2, pp. 1--24, 2015.Google ScholarCross Ref
X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, "An approach to detecting duplicate bug reports using natural language and execution information," in Software Engineering, 2008. ICSE'08. ACM/IEEE 30th International Conference on, 2008, pp. 461--470. Google ScholarDigital Library
R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999. Google ScholarDigital Library
A. Hotho, A. Nürnberger, and G. Paaß, "A brief survey of text mining," LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 2005.Google ScholarCross Ref
C. C. Aggarwal and C. Zhai, Mining Text Data. Springer Science & Business Media, 2012. Google ScholarCross Ref
K. Aggarwal, F. Timbers, T. Rutgers, A. Hindle, E. Stroulia, and R. Greiner, "Detecting duplicate bug reports with software engineering domain knowledge: Detecting Duplicate Bug Reports with Software-Engineering Domain Knowledge," J Softw Evol Proc, vol. 29, no. 3, p. e1821, Mar. 2017.Google ScholarCross Ref
A. Alipour, A. Hindle, and E. Stroulia, "A contextual approach towards more accurate duplicate bug report detection," in 2013 10th Working Conference on Mining Software Repositories (MSR), 2013. Google ScholarDigital Library
A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo, and C. Sun, "Duplicate bug report detection with a combination of information retrieval and topic modeling," in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering - ASE 2012, 2012. Google ScholarDigital Library
N. Jalbert and W. Weimer, "Automated duplicate detection for bug tracking systems," in 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), 2008.Google Scholar
R. P. Gopalan and A. Krishna, "Duplicate Bug Report Detection Using Clustering," in 2014 23rd Australian Software Engineering Conference, 2014. Google ScholarDigital Library
Apache, Apache Solr Reference Guide For Solr 7.1, 7.1 ed. 2017.Google Scholar
Y. Tian, D. Lo, and C. Sun, "Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction," in 2012 19th Working Conference on Reverse Engineering, 2012. Google ScholarDigital Library
E. M. da Silva, E. M. da Silva, and R. R. Souza, "Comparing three different techniques to retrieve documents using multiwords expressions," in Proceedings of 10th CONTECSI International Conference on Information Systems and Technology Management, 2013.Google Scholar
L. S. Gaspar, F. C. Bernardini, L. Weitzel, "Análise do uso de recuperação da informação em documentos de atendimento - Estudo de caso em bases de soluções de informática," presented at the SBSI, 2016, p. 4.Google Scholar
D. Shahi, Apache Solr: A Practical Approach to Enterprise Search. Apress, 2015. Google ScholarDigital Library
Oliveira and A. M. De, "Um método de detecção de plágio em códigos-fonte para disciplinas iniciais de programação," Dec. 2016.Google Scholar
N. M. S. Amador, "Um Meta-Motor de Pesquisa para a Web Portuguesa," Universidade Técnica de Lisboa, 2009.Google Scholar
K. M. Cassiano, "Análise de Séries Temporais Usando Análise Espectral Singular (SSA) e Clusterização de Suas Componentes Baseada em Densidade," Pontifícia Universidade Católica do Rio de Janeiro, 2014.Google Scholar
S. Osinski and D. Weiss, "A concept-driven algorithm for clustering search results," IEEE Intell. Syst., vol. 20, no. 3, pp. 48--54, May 2005. Google ScholarDigital Library
A. S. Bordin and Others, "Framework baseado em conhecimento para análise de rede de colaboração científica," 2015.Google Scholar
S. Osiński, J. Stefanowski, and D. Weiss, "Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition," in Intelligent Information Processing and Web Mining, 2004, pp. 359--368.Google ScholarCross Ref

Index Terms

AVS: An approach to identifying and mitigating duplicate bug reports

Recommendations

Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval

Issue tracking systems (ITSs) allow software end-users and developers to file issue reports and change requests. Reports are frequently duplicately filed for the same software issue. The retrieval of these duplicate issue reports is a tedious manual ...
Read More
Preventing duplicate bug reports by continuously querying bug reports

Bug deduplication or duplicate bug report detection is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report ...
Read More
Duplicate bug report detection with a combination of information retrieval and topic modeling
ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering

Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SBSI '18: Proceedings of the XIV Brazilian Symposium on Information Systems
June 2018
578 pages
ISBN:9781450365598
DOI:10.1145/3229345

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Duplicate bug reports
Information Retrieval
Text Clustering
Text Mining
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate181of557submissions,32%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 88
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AVS: An approach to identifying and mitigating duplicate bug reports

SBSI '18: Proceedings of the XIV Brazilian Symposium on Information Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval

Preventing duplicate bug reports by continuously querying bug reports

Duplicate bug report detection with a combination of information retrieval and topic modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

AVS: An approach to identifying and mitigating duplicate bug reports

SBSI '18: Proceedings of the XIV Brazilian Symposium on Information Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval

Preventing duplicate bug reports by continuously querying bug reports

Duplicate bug report detection with a combination of information retrieval and topic modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media