ABSTRACT
In general, software enterprises adopting Error Reporting Management Systems during the production/testing process. The types of information and a large amount of data stored in these systems leads to challenges related to the efficiency of error tracking, such as the presence of duplicate bug reports that hinder productivity. Ideally, a tester should identify a duplicate error report before creating it. In this work, we propose the AVS (Automatic Versatile Search tool), that contributes to the identification of duplicate errors based on Information Retrieval and Text Mining techniques. As proof of concept, we implemented the AVS in the context of the Motorola Test Center (MTC) at the Informatics Center of UFPE. Every search by a new error report candidate is preprocessed. Then, the calculation of similarity between the new report and those available in the database generates a ranked list of similarity. In the end, the results are clustering to produce a more advanced process of identifying duplicate potentials. Experiments carried out on a corpus of about 750,000 reports have revealed the tool's usefulness in identifying duplicate error reports.1
- F. de Castro Netto, M. de Oliveira Barros, and F. A. Baião, "Mineração da Base de Dados de Defeitos de Software," UNIVERSIDADE FEDERAL DO ESTADO DO RIO DE JANEIRO, 2009.Google Scholar
- A. Hindle, "Stopping duplicate bug reports before they start with Continuous Querying for bug reports." 2016.Google Scholar
- D. Swapna and K. Thammi Reddy, "A Study of Information Retrieval Approaches in Duplicate Bug Detection," Indian J. Sci. Technol., vol. 9, no. 43, 2016.Google ScholarCross Ref
- J. Zhang, X. Wang, D. Hao, B. Xie, L. Zhang, and H. Mei, "A survey on bug-report analysis," Sci. China Inf. Sci., vol. 58, no. 2, pp. 1--24, 2015.Google ScholarCross Ref
- X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, "An approach to detecting duplicate bug reports using natural language and execution information," in Software Engineering, 2008. ICSE'08. ACM/IEEE 30th International Conference on, 2008, pp. 461--470. Google ScholarDigital Library
- R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999. Google ScholarDigital Library
- A. Hotho, A. Nürnberger, and G. Paaß, "A brief survey of text mining," LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 2005.Google ScholarCross Ref
- C. C. Aggarwal and C. Zhai, Mining Text Data. Springer Science & Business Media, 2012. Google ScholarCross Ref
- K. Aggarwal, F. Timbers, T. Rutgers, A. Hindle, E. Stroulia, and R. Greiner, "Detecting duplicate bug reports with software engineering domain knowledge: Detecting Duplicate Bug Reports with Software-Engineering Domain Knowledge," J Softw Evol Proc, vol. 29, no. 3, p. e1821, Mar. 2017.Google ScholarCross Ref
- A. Alipour, A. Hindle, and E. Stroulia, "A contextual approach towards more accurate duplicate bug report detection," in 2013 10th Working Conference on Mining Software Repositories (MSR), 2013. Google ScholarDigital Library
- A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo, and C. Sun, "Duplicate bug report detection with a combination of information retrieval and topic modeling," in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering - ASE 2012, 2012. Google ScholarDigital Library
- N. Jalbert and W. Weimer, "Automated duplicate detection for bug tracking systems," in 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), 2008.Google Scholar
- R. P. Gopalan and A. Krishna, "Duplicate Bug Report Detection Using Clustering," in 2014 23rd Australian Software Engineering Conference, 2014. Google ScholarDigital Library
- Apache, Apache Solr Reference Guide For Solr 7.1, 7.1 ed. 2017.Google Scholar
- Y. Tian, D. Lo, and C. Sun, "Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction," in 2012 19th Working Conference on Reverse Engineering, 2012. Google ScholarDigital Library
- E. M. da Silva, E. M. da Silva, and R. R. Souza, "Comparing three different techniques to retrieve documents using multiwords expressions," in Proceedings of 10th CONTECSI International Conference on Information Systems and Technology Management, 2013.Google Scholar
- L. S. Gaspar, F. C. Bernardini, L. Weitzel, "Análise do uso de recuperação da informação em documentos de atendimento - Estudo de caso em bases de soluções de informática," presented at the SBSI, 2016, p. 4.Google Scholar
- D. Shahi, Apache Solr: A Practical Approach to Enterprise Search. Apress, 2015. Google ScholarDigital Library
- Oliveira and A. M. De, "Um método de detecção de plágio em códigos-fonte para disciplinas iniciais de programação," Dec. 2016.Google Scholar
- N. M. S. Amador, "Um Meta-Motor de Pesquisa para a Web Portuguesa," Universidade Técnica de Lisboa, 2009.Google Scholar
- K. M. Cassiano, "Análise de Séries Temporais Usando Análise Espectral Singular (SSA) e Clusterização de Suas Componentes Baseada em Densidade," Pontifícia Universidade Católica do Rio de Janeiro, 2014.Google Scholar
- S. Osinski and D. Weiss, "A concept-driven algorithm for clustering search results," IEEE Intell. Syst., vol. 20, no. 3, pp. 48--54, May 2005. Google ScholarDigital Library
- A. S. Bordin and Others, "Framework baseado em conhecimento para análise de rede de colaboração científica," 2015.Google Scholar
- S. Osiński, J. Stefanowski, and D. Weiss, "Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition," in Intelligent Information Processing and Web Mining, 2004, pp. 359--368.Google ScholarCross Ref
Index Terms
- AVS: An approach to identifying and mitigating duplicate bug reports
Recommendations
Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval
Issue tracking systems (ITSs) allow software end-users and developers to file issue reports and change requests. Reports are frequently duplicately filed for the same software issue. The retrieval of these duplicate issue reports is a tedious manual ...
Preventing duplicate bug reports by continuously querying bug reports
Bug deduplication or duplicate bug report detection is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report ...
Duplicate bug report detection with a combination of information retrieval and topic modeling
ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software EngineeringDetecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in ...
Comments