Evaluating the quality of user stories using large language models: an industry study

  • Erika Hernández-Agüero UCR / UNED
  • Christian Quesada-López UCR / UNED
  • José P. Chaves-Sánchez UNED

Abstract


The specification and maintenance of high-quality user stories are critical and challenging activities in agile software development, due to the dynamic nature of projects, the ambiguity of natural language, and the effort required for manual evaluations. This study investigates the use of large language models (LLMs) to assess the quality of user stories in an industrial software project, using the criteria defined by the INVEST framework. The performance of three LLM tools is compared with two evaluations conducted by requirements engineering experts. The results indicate that LLMs have the potential to support the automated assessment of user stories based on INVEST.
Keywords: software requirements, quality, NLP4RE, automation, LLM, user stories, INVEST

References

Belzner, L., Gabor, T. and Wirsing, M. (2023) "Large language model assisted softwareengineering: prospects, challenges, and a case study", In: International Conference onBridging the Gap between AI and Reality (pp. 355-374). Springer NatureSwitzerland.

Bosch, J. (2014) "Continuous software engineering: An introduction", In: Bosch, J.(eds) Continuous Software Engineering (pp. 3-13). Springer , Cham.

Bourque, P., and Fairley, R. (2014). Guide to the Software Engineering Body of Knowledge (Swebok). 335.

CertiProf. (2022), Scrum Master Professional Certificate.

Essel, H., Vlachopoulos, D., Essuman, A. and Amankwa, J. (2024) "ChatGPT effects oncognitive skills of undergraduate students: Receiving instant responses from AI-based conversational large language models (LLMs)", In: Computers and Education:Artificial Intelligence, 6, 100198.

Fitzgerald, B. and Stol, K. (2017) "Continuous software engineering: A roadmap andagenda", In Journal of Systems and Software, 123, 176-189.

Hernandez-Agüero, E., Quesada-López, C., & Chaves-Sánchez, J. P. (in press)."Integración de enfoques ágiles para el mejoramiento continuo de procesos desoftware", In: Proceedings of the 13th International Conference on Software ProcessImprovement (CIMPS 2024), Mérida, Yucatán, México. IEEE Xplore.

Krishna, M., Gaur, B., Verma, A. and Jalote, P. (2024). "Using LLMs in softwarerequirements specifications: an empirical evaluation", In: 2024 IEEE 32ndInternational Requirements Engineering Conference (RE) (pp. 475-483). IEEE.

Marques, N., Silva, R. and Bernardino, J. (2024). Using chatgpt in softwarerequirements engineering: A comprehensive review. Future Internet, 16(6), 180.

Parra, E., Dimou, C., Llorens, J., Moreno, V. and Fraga, A. (2015) "A methodology forthe classification of quality of requirements using machine learning techniques", In:Information and Software Technology, 67, 180-195.

Ronanki, K., Berger, C. and Horkoff, J. (2023) "Investigating ChatGPT’s potential toassist in requirements elicitation processes", In: 2023 49th Euromicro Conference onSoftware Engineering and Advanced Applications (SEAA) (pp. 354-361). IEEE.

Roumeliotis, K., Tselikas, N. and Nasiopoulos, D. (2024). "LLMs in e-commerce: acomparative analysis of GPT and LLaMA models in product review evaluation", In:Natural Language Processing Journal, 6, 100056.

Schwaber, K. and Sutherland, J. (2020), La Guía de Scrum.

Subedi, I., Singh, M., Ramasamy, V. and Walia, G. (2021), "Classification of testableand valuable user stories by using supervised machine learning classifiers", In: 2021IEEE International Symposium on Software Reliability Engineering Workshops(ISSREW) (pp. 409-414). IEEE.

Zhang, Z., Rayhan, M., Herda, T., Goisauf, M., and Abrahamsson, P. (2024). "Llm-based agents for automating the enhancement of user story quality: An early report",In: International Conference on Agile Software Development (pp. 117-126). SpringerNature Switzerland.
Published
2025-05-12
HERNÁNDEZ-AGÜERO, Erika; QUESADA-LÓPEZ, Christian; CHAVES-SÁNCHEZ, José P.. Evaluating the quality of user stories using large language models: an industry study. In: IBERO-AMERICAN CONFERENCE ON SOFTWARE ENGINEERING (CIBSE), 28. , 2025, Ciudad Real/Espanha. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 45-59. DOI: https://doi.org/10.5753/cibse.2025.35291.