Automated Test Case Generation in a Real-World System Using a Customized AI Agent: An Experience Report
Abstract
Test case design is an essential activity in software quality assurance. However, when performed manually, it can be time-consuming, error-prone, and require substantial effort, particularly in complex applications. This experience report describes the development and application of an artificial intelligence agent, built using the ChatGPT platform, designed to automate the process of generating test cases for a real-world system and reduce the time required. The agent was configured to simulate the role of a QA analyst, using functional requirements, interface prototypes, and prompt engineering strategies to produce test scenarios with high coverage and accuracy. Experimentswere conducted on one of the modules of a component assembly control system, comparing manually created test cases with those generated by the agent. The results showed a reduction of over 50% in specification time while maintaining the quality and coverage of the scenarios. This paper details the agent’s configuration, the results achieved, the challenges encountered, and the lessons learned, contributing evidence for the practical use of generative AI in the context of software quality assurance.
References
John Berryman and Albert Ziegler. 2025. Prompt Engineering for LLMs: The Art and Science of Building Large Language Model–Based Applications. O’Reilly Media, Inc., Sebastopol, CA, USA. Print edition.
Shreya Bhatia, Tarushi Gandhi, Dhruv Kumar, and Pankaj Jalote. 2024. System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT. arXiv preprint arXiv:2412.03693 (2024). arXiv:2412.03693 [cs.SE] Published on arXiv Dec 4, 2024.
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models Are Few-Shot Learners. arXiv preprint arXiv:2005.14165 (2020). DOI: 10.48550/arXiv.2005.14165
ChatGPT Brasil. 2024. ChatGPT Vision. [link] Informational web page; accessed 2025-06-30.
Sabit Ekin. 2024. Prompt Engineering for ChatGPT: A Quick Guide to Techniques, Tips, and Best Practices. DOI: 10.36227/techrxiv.22683919.v2 Preprint (TechRxiv); generated by OpenAI’s ChatGPT and edited by Sabit Ekin. Accessed 2025-06-24.
Navid Bin Hasan, Md Ashraful Islam, Junaed Younus Khan, Sanjida Senjik, and Anindya Iqbal. 2025. Automatic High-Level Test Case Generation Using Large Language Models. arXiv preprint arXiv:2503.17998 (2025). arXiv:2503.17998 [cs.SE] Published on arXiv Mar 23, 2025.
Caio Jordan de Lima Maia. 2023. Teste de Aceitação Gerado por Inteligência Artificial: Um Estudo Exploratório sobre Seu Potencial de Uso por Analistas de Teste. (2023). [link] Undergraduate thesis, Universidade Federal da Paraíba, João Pessoa, Brazil. In Portuguese.
Bernard Marr. 2024. Generative AI in Practice: 100+ Amazing Ways Generative Artificial Intelligence Is Changing Business and Society. John Wiley & Sons, Inc., Hoboken, NJ, USA. Print edition.
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. 2024. Large Language Models: A Survey. arXiv preprint abs/2402.06196 (2024). arXiv:2402.06196 [cs.CL] Published on arXiv Feb 5, 2024.
OpenAI. 2023. Introducing GPTs. [link] OpenAI blog post; accessed 2025-06-27.
OpenAI. 2024. ChatGPT. [link] OpenAI product page; accessed 2025-06-20.
OpenAI. 2024. Memory and New Controls for ChatGPT. [link] memory-and-new-controls-for-chatgpt Blog post; accessed 2025-07-14.
OpenAI. 2024. Thinking with Images. [link] Informational web page; accessed 2025-06-30.
OpenAI. 2024. What Are Tokens and How to Count Them? [link] OpenAI Help Center article; accessed 2025-07-15.
OpenAI. 2025. Prompt Engineering. [link] OpenAI API documentation; accessed 2025-07-15.
OpenAI. 2025. Prompt engineering best practices for ChatGPT. [link] Accessed: 2025-07-12.
OpenAI. 2025. Tokens. [link] OpenAI API Documentation; accessed 2025-06-25.
Harshad Vijay Pandhare. 2024. From Test Case Design to Test Data Generation: How AI Is Redefining QA Processes. International Journal of Engineering and Computer Science 13, 12 (2024), 26737–26757. DOI: 10.18535/ijecs.v13i12.4956
Saksorn Ruangtanusak, Pittawat Taveekitworachai, and Kunat Pipatanakul. 2025. Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting. arXiv preprint arXiv:2509.00482 (2025). DOI: 10.48550/arXiv.2509.00482
Ian Sommerville. 2011. Engenharia de Software (9 ed.). Pearson Prentice Hall, São Paulo, Brazil. Technical review by Kechi Hirama; translated from the original Software Engineering.
Arailym Talasbek. 2023. Artificial AI in Test Automation: Software Testing Opportunities with OpenAI Technology — ChatGPT. Journal of Emerging Technologies and Computing 62, 1 (2023), 5–14. DOI: 10.47344/sdubnts.v62i1.912
Gaolei Yi, Zizhao Chen, Zhenyu Chen, W. Eric Wong, and Nicholas Chau. 2023. Exploring the Capability of ChatGPT in Test Generation. In Proceedings of the 23rd IEEE International Conference on Software Quality, Reliability, and Security Companion (QRS-C). IEEE, 72–80. DOI: 10.1109/QRS-C60940.2023.00013
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. arXiv preprint abs/2303.18223 (2023). arXiv:2303.18223 [cs.CL] Published on arXiv Mar 31, 2023.
