On the Identification of Self-Admitted Technical Debt with Large Language Models

Pedro Lambert; Lucila Ishitani; Laerte Xavier

doi:10.5753/sbes.2024.3588

Pedro Lambert PUC Minas
Lucila Ishitani PUC Minas
Laerte Xavier PUC Minas

DOI: https://doi.org/10.5753/sbes.2024.3588

Resumo

Self-Admitted Technical Debt (SATD) refers to a common practice in software engineering involving developers explicitly documenting and acknowledging technical debt within their projects. Identifying SATD in various contexts is a key activity for effective technical debt management and resolution. While previous research has focused on natural language processing techniques and specialized models for SATD identification, this study explores the potential of Large Language Models (LLMs) for this task. We compare the performance of three LLMs - Claude 3 Haiku, GPT 3.5 turbo, and Gemini 1.0 pro - against the generalization of the state-of-the-art model designed for SATD identification. Additionally, we investigate the impact of prompt engineering on the performance of LLMs in this context. Our findings reveal that LLMs achieve competitive results compared to the state-of-the-art model. However, when considering the Matthews Correlation Coefficient (MCC), we observe that the LLM performance is less balanced, tending to score lower than the state-of-the-art model across all four confusion matrix categories. Nevertheless, with a well-designed prompt, we conclude that the models’ bias can be improved, resulting in a higher MCC score.

Palavras-chave: Self-Admitted Technical Debt, Large Language Models, Prompt Engineering

Referências

2024. MMLU Benchmark. [link]

Open ai. 2024. Chat GPT. [link]

Mohammad Aljanabi, Mohanad Ghazi Yaseen, Ahmed Hussein Ali, and Mostafa Abdulghafoor Mohammed. 2023. Prompt Engineering: Guiding the Way to Effective Large Language Models. Iraqi Journal For Computer Science and Mathematics 4, 4 (Nov. 2023), 151–155. DOI: 10.52866/ijcsm.2023.04.04.012

Anthropic. 2024. Anthropic Console. [link]

Paris Avgeriou, Philippe Kruchten, Ipek Ozkaya, and Carolyn Seaman. 2016. Managing Technical Debt in Software Engineering (Dagstuhl Seminar 16162). Dagstuhl Reports 6 (01 2016). DOI: 10.4230/DagRep.6.4.110

Gabriele Bavota and Barbara Russo. 2016. A Large-Scale Empirical Study on Self-Admitted Technical Debt. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). 315–326.

Loredana Caruccio, Stefano Cirillo, Giuseppe Polese, Giandomenico Solimando, Shanmugam Sundaramurthy, and Genoveffa Tortora. 2024. Claude 2.0 large language model: Tackling a real-world classification problem with a new iterative prompt engineering approach. Intelligent Systems with Applications 21 (2024), 200336. DOI: 10.1016/j.iswa.2024.200336

Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1 (2020), 6. DOI: 10.1186/s12864-019-6413-7

Marcus Ciolkowski, Valentina Lenarduzzi, and Antonio Martini. 2021. 10 Years of Technical Debt Research and Practice: Past, Present, and Future. IEEE Software 38, 6 (2021), 24–29. DOI: 10.1109/MS.2021.3105625

Ward Cunningham. 1992. The WyCash portfolio management system. SIGPLAN OOPS Mess. 4, 2 (dec 1992), 29–30. DOI: 10.1145/157710.157715

Rishi Bommasani et al. 2022. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG]

Martin Fowler. 2005. Detestable. [link]

Martin Fowler. 2006. CodeSmells. [link]

Martin Fowler. 2007. Design Stamina Hypothesis. [link]

Martin Fowler. 2008. Estimated Interest. [link]

Martin Fowler. 2009. Technical Debt Quadrant. [link]

Andrew Gao. 2023. Prompt Engineering for Large Language Models. SSRN (8 July 2023), 8. DOI: 10.2139/ssrn.4504303 8 Pages Posted: 17 Jul 2023.

Louie Giray. 2023. Prompt Engineering with ChatGPT: A Guide for Academic Writers. Annals of Biomedical Engineering 51, 12 (01 12 2023), 2629–2633. DOI: 10.1007/s10439-023-03272-4

Google. 2024. Gemini. [link]

Kemal Kirtac and Guido Germano. 2024. Sentiment trading with large language models. Finance Research Letters 62 (2024), 105227. DOI: 10.1016/j.frl.2024.105227

Oscar N.E. Kjell, Katarina Kjell, and H. Andrew Schwartz. 2024. Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment. Psychiatry Research 333 (2024), 115667. DOI: 10.1016/j.psychres.2023.115667

Y. Li, M. Soliman, and P. Avgeriou. 2022. Identifying self-admitted technical debt in issue tracking systems using machine learning. Empir Software Eng 27 (2022), 131. DOI: 10.1007/s10664-022-10128-3

Y. Li, M. Soliman, and P. Avgeriou. 2023. Automatic identification of self-admitted technical debt from four different sources. Empir Software Eng 28 (2023), 65. DOI: 10.1007/s10664-023-10297-9

Y. Li, M. Soliman, P. Avgeriou, and M. Van Ittersum. 2023. DebtViz: A Tool for Identifying, Measuring, Visualizing, and Monitoring Self-Admitted Technical Debt. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 558–562. DOI: 10.1109/ICSME58846.2023.00072

Erin Lim, Nitin Taksande, and Carolyn Seaman. 2012. A Balancing Act: What Software Practitioners Have to Say about Technical Debt. IEEE Software 29, 6 (2012), 22–27. DOI: 10.1109/MS.2012.130

Everton Da S. Maldonado, Rabe Abdalkareem, Emad Shihab, and Alexander Serebrenik. 2017. An Empirical Study on the Removal of Self-Admitted Technical Debt. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). 238–248. DOI: 10.1109/ICSME.2017.8

Robert C. Martin. 2009. A mess is not technical debt. [link]

Ggaliwango Marvin, Nakayiza Hellen, Daudi Jjingo, and Joyce Nakatumba-Nabende. 2024. Prompt Engineering in Large Language Models. In Data Intelligence and Cognitive Informatics, I. Jeena Jacob, Selwyn Piramuthu, and Przemyslaw Falkowski-Gilski (Eds.). Springer Nature Singapore, Singapore, 387–402.

A. Mastropaolo, M. Di Penta, and G. Bavota. 2023. Towards Automatically Addressing Self-Admitted Technical Debt: How Far AreWe?. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE Computer Society, Los Alamitos, CA, USA, 585–597. DOI: 10.1109/ASE56229.2023.00103

Steve McConnell. 2008. Managing Technical Debt. [link]

Microsoft. [n. d.]. Visual Studio Code. [link]

Aniket Potdar and Emad Shihab. 2014. An Exploratory Study on Self-Admitted Technical Debt. In 2014 IEEE International Conference on Software Maintenance and Evolution. 91–100. DOI: 10.1109/ICSME.2014.31

Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv preprint arXiv:2402.07927 (2024). DOI: 10.48550/arXiv.2402.07927 Submitted on 5 Feb 2024.

Giancarlo Sierra, Emad Shihab, and Yasutaka Kamei. 2019. A survey of selfadmitted technical debt. Journal of Systems and Software 152 (2019), 70–82. DOI: 10.1016/j.jss.2019.02.056

Daniel Skryseth, Karthik Shivashankar, Ildikó Pilán, and Antonio Martini. 2023. Technical Debt Classification in Issue Trackers using Natural Language Processing based on Transformers. In 2023 ACM/IEEE International Conference on Technical Debt (TechDebt). 92–101. DOI: 10.1109/TechDebt59074.2023.00017

Yu Tian, Ang Liu, Yun Dai, Keisuke Nagato, and Masayuki Nakao. 2024. Systematic synthesis of design prompts for large language models in conceptual design. CIRP Annals (2024). DOI: 10.1016/j.cirp.2024.04.062

The Verger. 2023. ChatGPT continues to be one of the fastest-growing services ever. [link]

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Language Models. arXiv:2206.07682 [cs.CL]

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382 [cs.SE]

Laerte Xavier, Rodrigo dos Santos, Sandalo Bessa, and Marco Tulio Valente. 2020. Ltd: Less Technical Debt Framework. SSRN (2020). [link] Available at SSRN: [link] or DOI: 10.2139/ssrn.4397233.

Laerte Xavier, Fabio Ferreira, Rodrigo Brito, and Marco Tulio Valente. 2020. Beyond the Code: Mining Self-Admitted Technical Debt in Issue Tracker Systems. In Proceedings of the 17th International Conference on Mining Software Repositories (Seoul, Republic of Korea) (MSR ’20). Association for Computing Machinery, New York, NY, USA, 137–146. DOI: 10.1145/3379597.3387459

Laerte Xavier, Jean Eduardo Montandon, Fabio Ferreira, et al. 2022. On the documentation of self-admitted technical debt in issues. Empir Software Eng 27 (2022), 163. DOI: 10.1007/s10664-022-10203-9

Laerte Xavier, João Eduardo Montandon, Fabio Ferreira, Rodrigo Brito, and Marco Tulio Valente. 2022. On the documentation of self-admitted technical debt in issues. Empirical Software Engineering 27, 7 (2022), 163. DOI: 10.1007/s10664-022-10203-9

Laerte Xavier, João Eduardo Montandon, and Marco Tulio Valente. 2022. Comments or Issues: Where to Document Technical Debt. IEEE Software 39, 5 (2022), 84–91. DOI: 10.1109/MS.2022.3170825