Evaluating Fine-tuning Approaches for Duplicate Bug Report Detection

Luiz Eduardo Philippi Rosane; Robert Einer; Mert Yurdakul; Francisco Gomes de Oliveira Neto

doi:10.5753/sbes.2025.9809

Luiz Eduardo Philippi Rosane University of Gothenburg
Robert Einer University of Gothenburg
Mert Yurdakul Test Scouts
Francisco Gomes de Oliveira Neto Chalmers University of Technology / University of Gothenburg

DOI: https://doi.org/10.5753/sbes.2025.9809

Resumo

Bug reports are artefacts that document defects encountered by users or developers. Rapid testing and release cycles often lead to the creation of similar or near-duplicate bug reports, introducing redundancy, delaying triage, and increasing maintenance overhead. Although prior research has extensively explored automated methods to detect and manage duplicate bug reports, their natural language nature makes recent advances in large language models (LLMs)—particularly BERT and its successors—a promising avenue for improving robustness and accuracy. In this study, we investigate the use of LLMs to identify duplicate bug reports (DBRs), focusing on the impact of fine-tuning an all-mpnet-base-v2 model, which builds on BERT-based architectures while addressing several of their limitations. We fine-tuned the model using large, open-source bug tracking datasets from the Eclipse, OpenOffice, Firefox, and NetBeans projects. Our evaluation shows that fine-tuning yields only marginal performance improvements across all datasets. We also discuss the trade-offs involved in fine-tuning LLMs for this task, including hyperparameter tuning guidelines and the practical challenges posed by computational and financial cost.

Palavras-chave: Duplicate Bug Reports, Fine-Tuning, Text Similarity

Referências

[n. d.]. Embeddings. [link].

[n. d.]. Fresh 2D-Matryoshka Embedding Model. [link] mixedbread.ai, accessed 11/05/2024.

[n. d.]. Sentence-Transformers. [link], accessed 20/02/2024.

2019. thiagomarquesrocha/siameseQAT. GitHub, [link], accessed 09/02/2024.

2023. transformers. Hugging Face, [link], accessed 03/02/2024.

Nicolas Bettenburg, Rahul Premraj, Thomas Zimmermann, and Sunghun Kim. 2008. Duplicate bug reports considered harmful. . . really?. In 2008 IEEE International Conference on Software Maintenance. IEEE, 337–345.

Yguarata Cerqueira Cavalcanti, Eduardo Santana de Almeida, Carlos Eduardo Albuquerque da Cunha, Daniel Lucrédio, and Silvio Romero de Lemos Meira. 2010. An Initial Study on the Bug Report Duplication Problem. In 2010 14th European Conference on Software Maintenance and Reengineering. 264–267. DOI: 10.1109/CSMR.2010.52

Yguarata Cerqueira Cavalcanti, Paulo Anselmo da Mota Silveira Neto, Daniel Lucrédio, Tassio Vale, Eduardo Santana de Almeida, and Silvio Romero de Lemos Meira. 2013. The bug report duplication problem: an exploratory study. Software Quality Journal 21 (2013), 39–66.

Francisco Gomes de Oliveira Neto, Richard Torkar, Robert Feldt, Lucas Gren, Carlo A. Furia, and Ziwei Huang. 2019. Evolution of statistical analysis in empirical software engineering research: Current state and steps forward. Journal of Systems and Software 156 (2019), 246–267. DOI: 10.1016/j.jss.2019.07.002

Jayati Deshmukh, KM Annervaz, Sanjay Podder, Shubhashis Sengupta, and Neville Dubash. 2017. Towards accurate duplicate bug retrieval using deep learning techniques. In 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 115–124.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

Malte Götharsson, Karl Stahre, Gregory Gay, and Francisco Gomes de Oliveira Neto. 2024. Exploring the Role ofAutomation in Duplicate Bug Report Detection: An Industrial Case Study. (2024), 193–203. DOI: 10.1145/3644032.3644450

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International conference on machine learning. PMLR, 2790–2799.

Haruna Isotani, HironoriWashizaki, Yoshiaki Fukazawa, Tsutomu Nomoto, Saori Ouji, and Shinobu Saito. 2023. Sentence embedding and fine-tuning to automatically identify duplicate bugs. Frontiers in Computer Science 4, 1032452.

Taemin Kim and Geunseok Yang. 2022. Predicting Duplicate in Bug Report Using Topic-Based Duplicate Learning With Fine Tuning-Based BERT Algorithm. IEEE Access 10 (2022), 129666–129675. DOI: 10.1109/ACCESS.2022.3226238

Berfin Kucuk and Eray Tuzun. 2021. Characterizing Duplicate Bugs: An Empirical Analysis. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 661–668. DOI: 10.1109/SANER50967.2021.00084

Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Generating duplicate bug datasets. In Proceedings of the 11th Working Conference on Mining Software Repositories (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 392–395. DOI: 10.1145/2597073.2597128

Mingyang Li, Lin Shi, and Qing Wang. 2019. Are all duplicates value-neutral? an empirical analysis of duplicate issue reports. In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS). IEEE, 272–279.

Tianyang Lin, Yuxin Wang, Xiangyang Liu, and Xipeng Qiu. 2022. A survey of transformers. AI open 3 (2022), 111–132.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

Avinash Patil, Kihwan Han, and Sabyasachi Mukhopadhyay. 2023. A comparative study of text embedding models for semantic text similarity in bug reports. arXiv preprint arXiv:2308.09193 (2023).

Vijay Raj and Jyoti Shetty. 2023. TicketTrace: Intelligent High Parity Ticket Detection Through Deep Learning Techniques. In 2023 7th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS). 1–6. DOI: 10.1109/CSITSS60515.2023.10334156

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).

Thiago Marques Rocha and André Luiz Da Costa Carvalho. 2021. SiameseQAT: A Semantic Context-Based Duplicate Bug Report Detection Using Replicated Cluster Information. IEEE Access 9 (2021), 44610–44630. DOI: 10.1109/ACCESS.2021.3066283

Irving Muller Rodrigues, Daniel Aloise, Eraldo Rezende Fernandes, and Michel Dagenais. 2020. A soft alignment model for bug deduplication. In Proceedings of the 17th International Conference on Mining Software Repositories. 43–53.

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems 33 (2020), 16857–16867.

Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 253–262.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H.Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. [link]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).

Ting Zhang, Donggyun Han, Venkatesh Vinayakarao, Ivana Clairine Irsan, Bowen Xu, Ferdian Thung, David Lo, and Lingxiao Jiang. 2023. Duplicate Bug Report Detection: How Far Are We? ACM Trans. Softw. Eng. Methodol. 32, 4, Article 97 (may 2023), 32 pages. DOI: 10.1145/3576042