Automatic Generation of Bug Reports Using Large Language Models: An Evaluation in a Software Institute

  • Lennon Chaves Sidia Institute of Science and Technology
  • Davi Gonzaga Sidia Institute of Science and Technology
  • Leonardo Tiago Sidia Institute of Science and Technology
  • Ana Paula Silva Sidia Institute of Science and Technology
  • Flávia Oliveira Sidia Institute of Science and Technology

Abstract


Context: During software development, the test team is responsible for verifying if the requirements were correctly implemented. In case the software does not behave as expected, the tester must report the bug to the development team. For the problem to be assertively and quickly, it is important that the bug report is complete, containing all the information needed to fix the bug. This study was conducted in the context of a test team that acts in a software institute, and is responsible for testing mobile devices. Problem: Among their activities, the test team must write bug reports. However, given a high amount of test requests, consequently there will be many bug reports, which will take more time and effort. Goal: With the goal of automating the bug report process, we developed a system that automatically generates bug reports based on the Text-to-Text Transfer Transformer (T5) Large Language Model (LLM). To generate a bug report, the tester must input a prompt containing a brief description of the bug. Method: We also performed an experiment with 8 members from the test team that write bug reports daily, so as to measure their perceptions regarding the solution we developed. The participants used the system and evaluated its outputs for the generation of 5 distinct bug reports. Results: Results showed a high acceptance rate for using the system within the test team, with 3 of the 5 bug types included in the experiment having 87.5% of outputs considered valid or partially valid. Furthermore, participants highlighted the ease of use, the efficacy and quality of writing in the bug reports generated by the system. However, they also noted that the system still needs to be adjusted to work for more types of issues, thus reducing the need for manual fixes. Conclusions: We conclude, then, that it is possible to use LLMs to automatically generate bug reports, however it is still indispensable to have a human review the report after it is generated.

Keywords: Bug Report, Large Language Models, Software Testing

References

Jagrit Acharya and Gouri Ginde. 2025. Can We Enhance Bug Report Quality Using LLMs?: An Empirical Study of LLM-Based Bug Report Generation. arXiv:2504.18804 [cs.SE] [link]

Victor R Basili. 1994. Goal, question, metric paradigm. Encyclopedia of software engineering 1 (1994), 528–532.

Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiß, Rahul Premraj, and Thomas Zimmermann. 2007. Quality of bug reports in eclipse. In Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange. 21–25.

Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. 2008. What makes a good bug report?. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. 308–318.

Oscar Chaparro. 2017. Improving Bug Reporting, Duplicate Detection, and Localization. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). 421–424. DOI: 10.1109/ICSE-C.2017.27

Lennon Chaves, Flávia Oliveira, and Leonardo Tiago. 2024. Automating Issue Reporting in Software Testing: Lessons Learned from Using the Template Generator Tool. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (Porto de Galinhas, Brazil) (FSE 2024). Association for Computing Machinery, New York, NY, USA, 278–282. DOI: 10.1145/3663529.3663847

Lennon Chaves, Flavia Oliveira, and Leonardo Tiago. 2024. Enhancing Automated Tools: Reporting Bugs with Bug Builder. In Anais do IX Simpósio Brasileiro de Testes de Software Sistemático e Automatizado (Curitiba/PR). SBC, Porto Alegre, RS, Brasil, 80–82. DOI: 10.5753/sast.2024.3674

Sidong Feng and Chunyang Chen. 2024. Prompting is all you need: Automated android bug replay with large language models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13.

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. 2024. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL] [link]

Enyo José Tavares Gonçalves. 2010. Importância de Testes Sistêmicos para a Qualidade do Software. Revista Tecnologia 31, 1 (2010), 29–38.

Davi Gonzaga, Leonardo Tiago, Ana Silva, Flávia Oliveira, and Lennon Chaves. 2025. Improving Bug Reporting by Fine-Tuning the T5 Model: An Evaluation in a Software Industry. In Anais do X Simpósio Brasileiro de Testes de Software Sistemático e Automatizado (Recife/PE). SBC, Porto Alegre, RS, Brasil, 141–143. DOI: 10.5753/sast.2025.13591

Jueun Heo, Gibeom Kwon, Changwon Kwak, and Seonah Lee. 2024. AComparison of Pretrained Models for Classifying Issue Reports. IEEE Access 12 (2024), 79568–79584. DOI: 10.1109/ACCESS.2024.3408688

Abhishek Kumar, Sonia Haiduc, Partha Pratim Das, and Partha Pratim Chakrabarti. 2024. LLMs as Evaluators: A Novel Approach to Evaluate Bug Report Summarization. arXiv:2409.00630 [cs.SE] [link]

Christopher D. Manning. 2022. Human Language Understanding & Reasoning. Daedalus 151, 2 (05 2022), 127–138. DOI: 10.1162/daed_a_01905 arXiv: [link]

Jesse G Meyer, Ryan J Urbanowicz, Patrick CN Martin, Karen O’Connor, Ruowang Li, Pei-Chen Peng, Tiffani J Bright, Nicholas Tatonetti, Kyoung Jae Won, Graciela Gonzalez-Hernandez, et al. 2023. ChatGPT and large language models in academia: opportunities and challenges. BioData mining 16, 1 (2023), 20.

Wendkûuni C Ouédraogo, Kader Kaboré, Haoye Tian, Yewei Song, Anil Koyuncu, Jacques Klein, David Lo, and Tegawendé F Bissyandé. 2024. Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation. arXiv preprint arXiv:2407.00225 (2024).

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2023. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv:1910.10683 [cs.LG] [link]

Betim Sherifi, Khaled Slhoub, and Fitzroy Nembhard. 2024. The Potential of LLMs in Automating Software Testing: From Generation to Reporting. arXiv:2501.00217 [cs.SE] [link]

Yang Song. 2024. Automated Bug Report Management to Enhance Software Development. The College of William and Mary.

Yang Song, Junayed Mahmud, Nadeeshan De Silva, Ying Zhou, Oscar Chaparro, Kevin Moran, Andrian Marcus, and Denys Poshyvanyk. 2023. Burt: A chatbot for interactive bug reporting. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 170–174.

Shenglong Tan, Shenghong Hu, and Lihong Chen. 2010. A Framework of Bug Reporting System Based on Keywords Extraction and Auction Algorithm. In 2010 Fifth Annual ChinaGrid Conference. 281–284. DOI: 10.1109/ChinaGrid.2010.13

KS Thant and HHK Tin. 2023. The impact of manual and automatic testing on software testing efficiency and effectiveness. Indian journal of science and research 3, 3 (2023), 88–93.

ClaesWohlin, Per Runeson, Martin Hst, Magnus C. Ohlsson, Bjrn Regnell, and Anders Wessln. 2012. Experimentation in Software Engineering. Springer Publishing Company, Incorporated.

Xin Xia, David Lo, Ming Wen, Emad Shihab, and Bo Zhou. 2014. An empirical study of bug report field reassignment. In 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). 174–183. DOI: 10.1109/CSMR-WCRE.2014.6747167

Yi Yao, Jun Wang, Yabai Hu, Lifeng Wang, Yi Zhou, Jack Chen, Xuming Gai, ZhenmingWang, andWenjun Liu. 2024. BugBlitz-AI: An Intelligent QA Assistant. In 2024 IEEE 15th International Conference on Software Engineering and Service Science (ICSESS). 57–63. DOI: 10.1109/ICSESS62520.2024.10719045
Published
2025-11-04
CHAVES, Lennon; GONZAGA, Davi; TIAGO, Leonardo; SILVA, Ana Paula; OLIVEIRA, Flávia. Automatic Generation of Bug Reports Using Large Language Models: An Evaluation in a Software Institute. In: BRAZILIAN SOFTWARE QUALITY SYMPOSIUM (SBQS), 24. , 2025, São José dos Campos/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 337-345. DOI: https://doi.org/10.5753/sbqs.2025.13599.