Generating Exploratory Test Suites with ChatGPT: Insights from an Empirical Study

Natália Salvino André; Everton L. G. Alves

doi:10.5753/sast.2025.14094

Natália Salvino André UFCG
Everton L. G. Alves UFCG

DOI: https://doi.org/10.5753/sast.2025.14094

Resumo

Exploratory Testing (ET) plays a critical role in ensuring software quality by uncovering edge cases and enabling adaptive evaluation of systems. However, ET is often constrained by its reliance on tester expertise, challenges in reproducing failures, and limited testing resources. Generative Artificial Intelligence (GenAI), particularly ChatGPT, presents opportunities to support ET by automating tasks such as test case generation. This paper reports on an empirical study comparing exploratory test suites generated by ChatGPT using six image-based prompts with those created by human testers. We evaluated the quality of the generated suites, identified the most effective prompt formats, and gathered feedback from professional testers. Results show that ChatGPT can produce high-quality test suites, though it may miss complex faults and occasionally generate incoherent cases. The best performance was achieved using sub-prompts focused on specific features. Testers found the AI-generated suites clear and practical but noted gaps requiring complementary manual scenarios. These findings suggest that ChatGPT can serve as a valuable aid in ET, while highlighting areas for further improvement.

Palavras-chave: Exploratory Testing, GenAI, ChatGPT, Manual Testing

Referências

[n. d.]. ActivityDiary. [link]. Accessed: 2024-09-12.

[n. d.]. AnkiDroid. [link]. Accessed: 2024-09-12.

[n. d.]. The Importance of Software Testing. [link]. Accessed: 2024-09-24.

[n. d.]. Omni-Notes. [link]. Accessed: 2024-09-12.

[n. d.]. Par de Jarro. [link]. Accessed: 2024-09-12.

[n. d.]. Readme.so. [link]. Accessed: 2024-09-12.

2022. Introducing ChatGPT. [link]. Accessed: 2024-09-22.

JD Cem Kaner and James Bach. 2006. The nature of exploratory testing. (2006).

Deepak Parmar. [n. d.]. Exploratory testing. [link]. Accessed: 2024-09-21.

Vitor Guilherme and Auri Vincenzi. 2023. An initial investigation of ChatGPT unit test generation capability. In Proceedings of the 8th Brazilian Symposium on Systematic and Automated Software Testing.

IBM. [n. d.]. What are large language models (LLMs)? [link]. Accessed: 2024-09-21.

IBM. [n. d.]. What is generative AI? [link]. Accessed: 2024-09-25.

Juha Itkonen and Mika V Mäntylä. 2014. Are test cases needed? Replicated comparison between exploratory and test-case-based software testing. Empirical Software Engineering 19 (2014), 303–342.

Ankur Joshi, Saket Kale, Satish Chandel, and D Kumar Pal. 2015. Likert scale: Explored and explained. British journal of applied science & technology 7, 4 (2015), 396–403.

Cem Kaner. 2008. A tutorial in exploratory testing. QUEST 6 (2008).

C. Kaner, J. Falk, and H.Q. Nguyen. 1999. Testing Computer Software. Wiley. [link]

Michael Kelly. 2018. Session-Based Test Management. In How to Reduce the Cost of Software Testing. Auerbach Publications, 119–136.

Kristie Wright. 2023. ChatGPT large language model: Everything you need to know. [link]. Accessed: 2024-09-22.

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2023. Chatting with gpt-3 for zero-shot humanlike mobile automated gui testing. arXiv preprint arXiv:2305.09434 (2023).

Megan Cerullo. 2023. ChatGPT is growing faster than TikTok. [link]. Accessed: 2024-09-24.

G.J. Myers, C. Sandler, and T. Badgett. 2011. The Art of Software Testing. Wiley. [link]

Dietmar Pfahl, Huishi Yin, Mika V Mäntylä, and Jürgen Münch. 2014. How is exploratory testing used? a state-of-the-practice survey. In Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement. 1–10.

Ting Su, Jue Wang, and Zhendong Su. 2021. Benchmarking Automated GUI Testing for Android against Real-World Bugs. In Proceedings of 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 119–130. DOI: 10.1145/3468264.3468620

Yanqi Su, Zheming Han, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu, and Qinghua Lu. 2022. Constructing a system knowledge graph of user tasks and failures from bug reports to support soap opera testing. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.

Yanqi Su, Dianshu Liao, Zhenchang Xing, Qing Huang, Mulong Xie, Qinghua Lu, and Xiwei Xu. 2024. Enhancing exploratory testing by large language model and knowledge graph. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12.

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837.

James A Whittaker. 2009. Exploratory software testing: tips, tricks, tours, and techniques to guide test design. Pearson Education.

Daniel Zimmermann and Anne Koziolek. 2023. GUI-Based Software Testing: An Automated Approach Using GPT-4 and Selenium WebDriver. In 2023 38th IEEE/ACM International Conference on Automated Software EngineeringWorkshops (ASEW). DOI: 10.1109/ASEW60602.2023.00028