Automatic Classification of Bug Reports for Mobile Devices: An Industrial Case Study

Renata F. Lins; Flávia A. Barros; Ricardo B. C. Prudêncio; Wallace N. Melo

doi:10.5753/eniac.2022.227555

Renata F. Lins UFPE
Flávia A. Barros UFPE
Ricardo B. C. Prudêncio UFPE
Wallace N. Melo UFPE / Motorola

DOI: https://doi.org/10.5753/eniac.2022.227555

Resumo

When a failure is found during software testing activities, a bug report (BR) is written and stored in product management tools. In order to prioritize the errors to fix, a BR triage process is performed to identify the most critical errors. This is specifically relevant in the context of mobile applications due to the fast development cycle, which results on a high number of BRs to evaluate daily. In this paper, Machine Learning (ML) and Natural Language Processing (NLP) techniques are investigated to automatically classify the criticality of BRs in the context of a real mobile environment, and a prototype was developed. Results on a corpus of 9,785 BRs were very satisfactory, reaching up to 0.79 of AUC and meeting the performance level required by the considered application.

Referências

Avizienis, A., Laprie, J.-C., Randell, B., and Landwehr, C. (2013). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1:11 - 33.

Bettenburg, N., Just, S., Schröter, A., Weiss, C., Premraj, R., and Zimmermann, T. (2008). What makes a good bug report? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT '08/FSE-16, page 308-318, New York, NY, USA. Association for Computing Machinery.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321-357.

Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1 - 30.

Fawcett, T. (2006). Introduction to roc analysis. Pattern Recognition Letters, 27:861-874.

Gasparetto, A., Marcuzzo, M., Zangari, A., and Albarelli, A. (2022). A survey on text classification algorithms: From text to predictions. Information, 13.

Goseva-Popstojanova, K. and Tyo, J. (2018). Identification of security related bug reports via text mining using supervised and unsupervised classification. In 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS), pages 344-355.

He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21:1263-1284.

Jivani, A. (2011). A comparative study of stemming algorithms. International Journal of Computer Applications in Technology, 2:1930-1938.

Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11-21.

Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1:309-317.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. Proceedings of Workshop at ICLR.

Otoom, A. F., Al-jdaeh, S., and Hammad, M. (2019). Automated classification of software bug reports. In Proceedings of the 9th International Conference on Information Communication and Management, ICICM 2019, page 17-21.

Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., and Wang, B. (2003). Automated support for classifying software failure reports. In 25th International Conference on Software Engineering, pages 465-475.

Refaeilzadeh, P., Tang, L., and Liu, H. (2009). Cross-validation. Encyclopedia of Database Systems, pages 532-538.

Runeson, P., Alexandersson, M., and Nyholm, O. (2007). Detection of duplicate defect reports using natural language processing. In 29th International Conference on Software Engineering (ICSE'07), pages 499-510.

Saif, H., Fernández, M., He, Y., and Alani, H. (2014). On stopwords, filtering and data sparsity for sentiment analysis of twitter. In LREC.

Tong, Y. and Zhang, X. (2021). Crowdsourced test report prioritization considering bug severity. Information and Software Technology, 139.

Uddin, J., Ghazali, R., Deris, M. M., Naseem, R., and Shah, H. (2017). A survey on bug prioritization. Artificial Intelligence Review, 47:145 - 180.

Wagner, S. (2006). A literature survey of the quality economics of defect-detection techniques. In Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering, page 194-203.

Zhang, J., Wang, X., Hao, D., Xie, B., Zhang, L., and Mei, H. (2015). A survey on bug-report analysis. Science China Information Sciences, 58:1 - 24.

Automatic Classification of Bug Reports for Mobile Devices: An Industrial Case Study

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)