Classifying BDD Software Tests Using Machine Learning and NLP Techniques

  • David Ferreira Brandão UPE
  • Cleyton Mário de Oliveira Rodrigues UPE
  • Wylliams Barbosa Santos UPE

Resumo


The increasing complexity and volume of Behavior-Driven Development (BDD) test scripts have made manual validation time-consuming, error-prone, and difficult to standardize. To address these challenges, this study proposes an automated approach for analyzing BDD scenarios through a hybrid solution that combines Natural Language Processing (NLP) and Machine Learning (ML) techniques. The solution consists of two components: an NLP-based validator that detects structural and linguistic inconsistencies in Gherkin steps, and a supervised ML classifier that assigns each step to one of three functional roles: Precondition, Action, or Expected Result independent of the original Gherkin keywords. The methodology includes the development and evaluation of a classification model trained on a labeled dataset of 1,500 synthetic BDD steps. Performance was validated using accuracy, precision, recall, and F1-score metrics, and further confirmed with real-world test data. The proposed system provides near-instant feedback per step, enabling efficient integration into real-time development workflows. This research demonstrates the feasibility of combining rule-based validation and machine learning classification to improve the quality, consistency, and maintainability of BDD test artifacts.

Referências

Anish, P. R., Lawhatre, P., Chatterjee, R., Joshi, V., and Ghaisas, S. (2022). Automated labeling and classification of business rules from software requirement specifications. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, pages 53–54.

Asha, R., Padmalata, N., Ajim, P., Piyush, K., and Vinay, K. (2023). Rclassify: Combining nlp and ml to classify rules. In 2023 IEEE 31st International Requirements Engineering Conference (RE).

Boukhers, Z. and Bouabdallah, A. (2022). Vision and natural language for metadata extraction from scientific pdf documents: A multimodal approach. 2022 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

Chandorkar, A., Patkar, N., Sorbo, A. D., and Nierstrasz, O. (2022). An exploratory study on the usage of gherkin features in open-source projects. 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

Farooq, M. S., Omer, U., Ramzan, A., Rasheed, M. A., and Atal, Z. (2023). Behavior driven development: A systematic literature review. IEEE Access.

Fowler, M. (1999). Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional.

Hellesøy, A., Wynne, M., and Tooke, S. (2017). The Cucumber Book: Behaviour-Driven Development for Testers and Developers. Pragmatic Bookshelf.

Lafi, M. and abdelQader, A. (2023). Automated business rules classification using machine learning to enhance software requirements elicitation. 2023 International Conference on Information Technology (ICIT).

Mati, D. N., Hamiti, M., Selimi, B., and Ajdari, J. (2021). Building spell-check dictionary for lowresource language by comparing word usage. 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO).

North, D. (2006a). Introducing bdd. Better Software, 8(2):29–34.

North, D. (2006b). Introducing bdd. Better Software.

OneDayTesting (2019). Gherkin: Concepts and benefits. Accessed on: September 4, 2024.

Rajbhoj, A., Nistala, P., Kulkarni, V., Soni, S., and Pathan, A. (2023). Doctomodel: Automated authoring of models from diverse requirements specification documents. 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

Russell, S. and Norvig, P. (2020). Artificial Intelligence: A Modern Approach. Pearson, 4th edition.

Smart, J. F. (2014). BDD in Action: Behavior-driven development for the whole software lifecycle. Manning Publications.

Taneja, K. and Vashishtha, J. (2022). Comparison of transfer learning and traditional machine learning approach for text classification. 2022 9th International Conference on Computing for Sustainable Global Development.

Wynne, M. and Hellesoy, A. (2017). The Cucumber Book: Behaviour-Driven Development for Testers and Developers. Pragmatic Bookshelf.

Xiao, J., Du, W., Xu, Z., and Qian, Y. (2023). Cross-system data integration based on rule-based nlp and node2vec. 2023 8th International Conference on Data Science in Cyberspace (DSC).

Ye, Y., Xie, X., Jin, H., and Wang, D. (2021). A hybrid model combined with svm and cnn for community content classification. In 2021 IEEE 23rd International Conference on High Performance Computing & Communications; 7th International Conference on Data Science & Systems; 19th International Conference on Smart City; 7th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Application.
Publicado
29/09/2025
BRANDÃO, David Ferreira; RODRIGUES, Cleyton Mário de Oliveira; SANTOS, Wylliams Barbosa. Classifying BDD Software Tests Using Machine Learning and NLP Techniques. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 712-723. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.14033.

Artigos mais lidos do(s) mesmo(s) autor(es)