A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges

Roberto F. Lima Jr; Luiz Fernando P. B. Presta; Lucca S. Borborema; Vanderson N. Silva; Marcio L. M. Dahia; Anderson Santos

doi:10.5753/cibse.2024.28465

Roberto F. Lima Jr CESAR
Luiz Fernando P. B. Presta CESAR School
Lucca S. Borborema CESAR School
Vanderson N. Silva CESAR
Marcio L. M. Dahia CESAR
Anderson Santos CESAR

DOI: https://doi.org/10.5753/cibse.2024.28465

Resumo

This study delves into the integration of Large Language Models (LLMs) in test case construction within software engineering, exploring their potential to enhance efficiency and effectiveness in test generation. Leveraging LLMs, known for their sophisticated natural language processing abilities, this research conducts a detailed case study on a representative software application to evaluate the practicality of LLMs in creating detailed and accurate test scenarios. The investigation focuses on the challenges and advantages of LLMs in test case development, assessing their impact on test comprehensiveness, accuracy, and the formulation process. By providing a nuanced understanding of LLMs’ role in software testing, this paper aims to inform practitioners and researchers about their potential and limitations, offering insights into their application in real-world testing environments and their contribution to advancing software testing methodologies.

Referências

Barraood, S., Mohd, H., and Baharom, F. (2021a). Test case quality factors. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12:1683–1694.

Barraood, S., Mohd, H., and Baharom, F. (2021b). Test case quality factors: Content analysis of software testing websites. Webology, 18:75 – 87.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.

Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al. (2020). Codebert: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547.

Gu, J., Han, Z., Chen, S., Beirami, A., He, B., Zhang, G., Liao, R., Qin, Y., Tresp, V., and Torr, P. (2023). A systematic survey of prompt engineering on vision-language foundation models. arXiv preprint arXiv:2307.12980.

Kim, S., Zhao, J., Tian, Y., and Chandra, S. (2021). Code prediction by feeding trees to transformers. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 150–162. IEEE.

Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun, R., and Zhou, X. (2023). Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702.

Lai, S.-T. (2017). Test case quality management procedure for enhancing the efficiency of iid continuous testing. Journal of Software, 12(10):794–806.

LangChain (2023). Langchain: Building applications with llms through composability. [link].

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.

OpenAI (2023). Openai: Advancing digital intelligence. [link]. Accessed on November 21, 2023.

Tufano, M., Drain, D., Svyatkovskiy, A., Deng, S. K., and Sundaresan, N. (2021). Unit test case generation with transformers and focal context. arXiv preprint arXiv:2009.05617.

Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., and Wang, Q. (2024). Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models.