Comparative Analysis of Large Language Model Tools for Automated Test Data Generation from BDD

  • Isela Mendoza UFF
  • Fernando Silva Filho UFF
  • Gustavo Medeiros UFF
  • Aline Paes UFF
  • Vânia O. Neves UFF

Resumo


Automating processes reduces human workload, particularly in software testing, where automation enhances quality and efficiency. Behavior-driven development (BDD) focuses on software behavior to define and validate required functionalities, using tools to translate functional requirements into automated tests. However, creating BDD scenarios and associated test data inputs is timeconsuming and heavily reliant on a good input data set. Large Language Models (LLMs) such as Microsoft’s Copilot, OpenAI’s ChatGPT-3.5, ChatGPT-4, and Google’s Gemini offer potential solutions by automating test data generation. This study evaluates these LLMs’ ability to understand BDD scenarios and generate corresponding test data across five scenarios ranked by complexity. It assesses the LLMs’ learning, assertiveness, response structuring, quality, representativeness, and coverage of the generated test data. The results indicate that ChatGPT-4 and Gemini stand out as the best tools that met our expectations, showing promise for advancing the automation of test data generation from BDD scenarios.

Palavras-chave: Test Automation, Large Language Models, Behavior-Driven Development, AI in Software Testing, Test Data Generation

Referências

Valentina Alto. 2023. Modern Generative AI with ChatGPT and OpenAI Models: Leverage the capabilities of OpenAI’s LLM for productivity and innovation with GPT3 and GPT4. Packt Publishing Ltd.

DAIR.AI. 2024. Prompt Engineering Guide. [link].

Yao Deng, Jiaohong Yao, Zhi Tu, Xi Zheng, Mengshi Zhang, and Tianyi Zhang. 2023. Target: Automated scenario generation from traffic rules for testing autonomous vehicles. arXiv preprint arXiv:2305.06018 (2023).

Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M Zhang. 2023. Large Language Models for Software Engineering: Survey and Open Problems. In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). 31–53.

Roger Ferguson and Bogdan Korel. 1996. The chaining approach for software test data generation. ACM Transactions on Software Engineering and Methodology (TOSEM) 5, 1 (01 1996), 63–86.

Dorothy Graham and Mark Fewster. 2012. Experiences of Test Automation: Case Studies of Software Test Automation. Addison-Wesley.

Zhe Liu, Chunyang Chen, Junjie Wang, Xing Che, Yuekai Huang, Jun Hu, and Qing Wang. 2023. Fill in the blank: Context-aware automated text input generation for mobile gui testing. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1355–1367.

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2023. Chatting with gpt-3 for zero-shot humanlike mobile automated gui testing. arXiv preprint arXiv:2305.09434 (2023).

Nicholas Nishimoto Marques and Rafael Alves Fernandes. 2020. Um arcabouço para a geração automatizada de testes funcionais a partir de cenários BDD. Bachelor in Computer Science.

Isela Mendoza, Fernando Silva Filho, Gustavo Medeiros, Aline Paes, and Vânia O. Neves. 2024. Data Repository for Comparative Analysis of LLM Tools in BDD Test Data Generation. [link].

Jean Carlos P. Miranda, Hugo T. Almeida, and Vânia O. Neves. 2018. PySoCA - Python Source-code Coverage and Analysis. In Anais do IX Congresso Brasileiro de Software (CBSoft 2018) - Sessão de Ferramentas. São Carlos/SP.

Glenford J Myers, Corey Sandler, and Tom Badgett. 2011. The Art of Software Testing. John Wiley & Sons.

Vânia O. Neves, Marcio E. Delamaro, and Paulo C. Masiero. 2017. Pateca: uma ferramenta de apoio ao teste estrutural de veículos autônomos. In Anais do VIII Congresso Brasileiro de Software (CBSoft 2017) - Sessão de Ferramentas. SBC, Porto Alegre, BR, 57–64.

Mohaimenul Azam Khan Raiaan, Md Saddam Hossain Mukta, Kaniz Fatema, Nur Mohammad Fahad, Sadman Sakib, Most Marufatul Jannat Mim, Jubaer Ahmad, Mohammed Eunus Ali, and Sami Azam. 2024. A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE Access 12 (2024), 26839–26874.

Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A Concolic Unit Testing Engine for C. ACM SIGSOFT Software Engineering Notes 30, 5 (2005), 263–272.

John Ferguson Smart and Jan Molak. 2023. BDD in Action: Behavior-driven development for the whole software lifecycle. Simon and Schuster.

Auri Marcelo Rizzo Vincenzi, Márcio Eduardo Delamaro, Arilo Claudio Dias Neto, Sandra Camargo Pinto Ferraz Fabbri, Mário Jino, and José Carlos Maldonado. 2018. Automatização de teste de software com ferramentas de software livre. Elsevier Brasil.

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Transactions on Software Engineering 50, 4 (2024), 911–936.

Guozhu Ye, Zhiqiang Tang, Shih-Hao Tan, Shiqi Huang, Dongdong Fang, Xiaoyang Sun, Lei Bian, Haibo Wang, and Zhendong Wang. 2021. Automated conformance testing for JavaScript engines via deep compiler fuzzing. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 435–450.

Ruilian Zhao and Qing Li. 2007. Automatic test generation for dynamic data structures. In 5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007). IEEE, 545–549.
Publicado
30/09/2024
MENDOZA, Isela; SILVA FILHO, Fernando; MEDEIROS, Gustavo; PAES, Aline; NEVES, Vânia O.. Comparative Analysis of Large Language Model Tools for Automated Test Data Generation from BDD. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 38. , 2024, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 280-290. DOI: https://doi.org/10.5753/sbes.2024.3423.