How Software Testers Perceive Large Language Models: A Focus Group Study with Lexicometric Analysis

Izabella Silva; Mirko Perkusich; Danyllo Albuquerque; Kyller Gorgônio; Angelo Perkusich

doi:10.5753/ise.2025.14878

Izabella Silva UFCG
Mirko Perkusich UFCG
Danyllo Albuquerque UFCG
Kyller Gorgônio UFCG
Angelo Perkusich UFCG

DOI: https://doi.org/10.5753/ise.2025.14878

Resumo

Context. Large Language Models (LLMs) are increasingly embedded in software-engineering toolchains, with potential to enhance quality-assurance tasks such as test-case generation, bug analysis, and traceability. Yet their industrial value remains unclear, partly because practicing testers’ perspectives are underexplored. Objective. We investigate how industry testers perceive the usefulness and limitations of LLMs across the Software Test Life Cycle (STLC), identifying (i) the most critical stages, (ii) operational bottlenecks, and (iii) perceived benefits, risks, and adoption conditions. Method. A focus group with five professional quality analysts (2–10 years’ experience) included hands-on interaction with an LLM-powered BDD notebook, followed by guided discussion and affinity-diagram voting. Forty-two statements were coded and prioritized. Results. Requirements Analysis and Test Planning were deemed most critical; 71% of votes linked rework to unclear or incomplete requirements. Benefits cited were automation of repetitive tasks, broader coverage, and faster learning; main concerns were prompt sensitivity, limited domain generalization, and data-privacy risks, with emphasis on human oversight and domain-adapted prompt libraries. Conclusion. While LLMs can improve efficiency and coverage, adoption depends on high-quality inputs, secure deployment, and early-phase integration. The findings offer empirically grounded guidance for aligning LLM solutions with the socio-technical realities of industrial testing teams.

Palavras-chave: Large Language Model, Software Testing, Software Test Life Cycle, Prompt Engineering, Requirements Analysis

Referências

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, and et al. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. arXiv preprint arXiv:2303.12712 (2023). [link]

Vahid Garousi, Michael Felderer, and Marco Kuhrmann. 2020. Exploring the Industry’s Challenges in Software Testing: An Empirical Study. Journal of Software: Evolution and Process 32, 8 (2020), e2251. DOI: 10.1002/smr.2251

Fatih Gurcan, Gonca Gokce Menekse Dalveren, Nergiz Ercil Cagiltay, Dumitru Roman, and Ahmet Soylu. 2022. Evolution of software testing strategies and trends: Semantic content analysis of software research corpus of the last 40 years. IEEE Access 10 (2022), 106093–106109.

ISTQB. 2024. Certified Tester Foundation Level Syllabus v4.0. [link]. Accessed 4 Jul 2025.

Eriks Klotins, Tony Gorschek, Katarina Sundelin, and Erik Falk. 2022. Towards cost-benefit evaluation for continuous software engineering activities. Empirical Software Engineering 27, 6 (2022), 157.

Jyrki Kontio, Laura Lehtola, and Johanna Bragge. 2004. Using the focus group method in software engineering: obtaining practitioner and user experiences. In Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE’04. IEEE, 271–280.

Arghavan Moradi Dakhel, Amin Nikanjam, Vahid Majdinasab, Foutse Khomh, and Michel C. Desmarais. 2023. Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing. arXiv preprint arXiv:2308.16557 (2023). [link]

David L. Morgan. 1997. Focus Groups as Qualitative Research (2 ed.). SAGE Publications.

Mika V. Mäntylä, Juha Itkonen, and Joonas Iivonen. 2012. Who Tested My Software? Testing as an Organizationally Cross-Cutting Activity. Software Quality Journal 20, 1 (2012), 145–172. DOI: 10.1007/s11219-011-9157-4

Per Runeson and Martin Höst. 2009. Guidelines for Conducting and Reporting Case Study Research in Software Engineering. Empirical Software Engineering 14, 2 (2009), 131–164. DOI: 10.1007/s10664-008-9102-8

Janice Singer, Susan E Sim, and Timothy C Lethbridge. 2008. Software engineering data collection for field studies. In Guide to advanced empirical software engineering. Springer, 9–34.

Michele Tufano, Chuning Chen, and Miltiadis Allamanis. 2023. Large Language Models as “Big Assistants” for Test Generation: An Empirical Study. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ’23). IEEE, 1234–1245. DOI: 10.1109/ASE57332.2023.00098

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Trans. Softw. Eng. 50, 4 (April 2024), 911–936. DOI: 10.1109/TSE.2024.3368208

Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, and Xin Peng. 2024. No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. arXiv preprint arXiv:2305.04207 (2024). [link]