Who Should Test the Requirement? A Comparative Study on Requirements Classification for Assigning Test Teams using the Pre-Trained Models

Alay Nascimento; Flávia Oliveira; Leonardo Tiago; Lennon Chaves

doi:10.5753/sbes.2025.11009

Alay Nascimento Sidia Institute of Science and Technology
Flávia Oliveira Sidia Institute of Science and Technology
Leonardo Tiago Sidia Institute of Science and Technology
Lennon Chaves Sidia Institute of Science and Technology

DOI: https://doi.org/10.5753/sbes.2025.11009

Resumo

Analyzing software requirements is a complex task, particularly for projects with a large volume of requirements, and when conducted manually, this task is time-consuming and prone to human errors. Moreover, once the software implements the requirements, it is essential to conduct tests to ensure the correct validation. Within a software institute, each new requirement can be assigned to a test team (teams 1 and 2) responsible for ensuring coverage by updating or creating test cases. There are instances in which a requirement is not assigned to either the team or is assigned to both. Each test team is tasked with validating a specific scope of requirements, making it crucial that each requirement is analyzed and validated by an appropriate test team. If a test team fails to validate a requirement within its scope, it can result in software vulnerability. To mitigate these issues, this paper described the use of pre-trained models, such as BERT, XLNet, and ELECTRA, to automate the process of requirement classification, thereby determining which test team should validate each new requirement. We compared the models based on accuracy, precision, recall, F1-Score, and Area Under Curve (AUC) macro metrics. Notably, the XLNet model demonstrated superior performance among the models, achieving 93.16% AUC Macro, while the BERT model achieved 91.28% AUC, and the ELECTRA model achieved 90.17% AUC.We also applied the non-parametric Friedman test to statistically validate the results, followed by the Conover squared rank test, with a significance level of 0.05. The results indicate that XLNet outperformed the BERT and ELECTRA models, exhibiting a superior capacity for assigning requirements to the correct test teams. Given the promising results of this research, this study aims to demonstrate the viability of using pre-trained models as a solution to optimize the testing process in the software industry.

Palavras-chave: Software Requirements, Classification, Pre-trained Model

Referências

NT Abdullaev and K Oghuz. 2023. Use of Machine Learning Models for Classification of Myographic Diseases. Biomedical Engineering 56, 5 (2023), 353–357.

A Anand and A Uddin. 2019. Importance of software testing in the process of software development. International Journal for Scientfic Research and Development 12, 6 (2019).

Chenyang Bu, Yuxin Liu, Manzong Huang, Jianxuan Shao, Shengwei Ji, Wenjian Luo, and Xindong Wu. 2024. Layer-Wise Learning Rate Optimization for Task-Dependent Fine-Tuning of Pre-Trained Models: An Evolutionary Approach. ACM Transactions on Evolutionary Learning 4, 4 (2024), 1–23.

Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).

William Jay Conover. 1999. Practical nonparametric statistics. john wiley & sons.

Richard A DeMillo. 2003. Software testing. In Encyclopedia of Computer Science. 1645–1649.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171–4186.

Edna Dias Canedo and Bruno Cordeiro Mendes. 2020. Software requirements classification using machine learning algorithms. Entropy 22, 9 (2020), 1057.

Daffa Hilmy Fadhlurrohman, Mira Kania Sabariah, Muhammad Johan Alibasa, and Jati Hiliamsyah Husen. 2023. Naive Bayes Classification Model for Precondition-Postcondition in Software Requirements. In 2023 International Conference on Data Science and Its Applications (ICoDSA). IEEE, 123–128.

Muhammad Fikriansyah, Hilal Nuha, and Muhammad Santriaji. 2023. A Deep Dive into Electra: Transfer Learning for Fine-Grained Text Classification on SST-2. In 2023 6th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). IEEE, 89–94.

Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association 32, 200 (1937), 675–701.

Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics (1979), 65–70.

Jin Huang and Charles X Ling. 2005. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on knowledge and Data Engineering 17, 3 (2005), 299–310.

Azham Hussain, Emmanuel OC Mkpojiogu, and Fazillah Mohmad Kamal. 2016. The role of requirements in the success or failure of software projects. International Review of Management and Marketing 6, 7 (2016), 306–311.

Muhammad Amin Khan, Muhammad Sohail Khan, Inayat Khan, Shafiq Ahmad, and Shamsul Huda. 2023. Non functional requirements identification and classification using transfer learning model. IEEE Access 11 (2023), 74997–75005.

Derya Kici, Aysun Bozanta, Mucahit Cevik, Devang Parikh, and Ayşe Başar. 2021. Text classification on software requirements specifications using transformer models. In Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering. 163–172.

Yu Beng Leau, Wooi Khong Loo, Wai Yip Tham, and Soo Fun Tan. 2012. Software development life cycle AGILE vs traditional approaches. In International Conference on Information and Network Technology, Vol. 37. 162–167.

George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39–41.

Changan Niu, Chuanyi Li, Vincent Ng, Dongxiao Chen, Jidong Ge, and Bin Luo. 2023. An empirical comparison of pre-trained models of source code. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2136–2148.

William S Noble. 2006. What is a support vector machine? Nature biotechnology 24, 12 (2006), 1565–1567.

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.

Klaus Pohl. 2016. Requirements engineering fundamentals: a study guide for the certified professional for requirements engineering exam-foundation level-IREB compliant. Rocky Nook, Inc.

Martin F Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130–137.

Irina Rish et al. 2001. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, Vol. 3. Seattle, USA, 41–46.

Samuel Sanford Shapiro and Martin B Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3-4 (1965), 591–611.

Navnath Shete and Avinash Jadhav. 2014. An empirical study of test cases in software testing. In International Conference on Information Communication and Embedded Systems (ICICES2014). IEEE, 1–5.

Ahmad F Subahi. 2023. Bert-based approach for greening software requirements engineering through non-functional requirements. IEEE Access 11 (2023), 103001–103013.

Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification?. In China national conference on Chinese computational linguistics. Springer, 194–206.

Chetan Surana Rajender Kumar Surana, Dipesh B Gupta, Sahana P Shankar, et al. 2019. Intelligent chatbot for requirements elicitation and classification. In 2019 4th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT). IEEE, 866–870.

Arpa Tasnim, Nazneen Akhter, Mohotina Khanam, and Nusrat Jahan Rimi. 2023. An Attention Based LSTM Model: Automated Requirement Classification from User Story. In 2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD). IEEE, 305–309.

Michael Unterkalmsteiner, Robert Feldt, and Tony Gorschek. 2014. A taxonomy for requirements engineering and software test alignment. ACM Transactions on Software Engineering and Methodology (TOSEM) 23, 2 (2014), 1–38.

Guoqiang Wu, Chongxuan Li, and Yilong Yin. 2023. Towards understanding generalization of macro-auc in multi-label learning. In International Conference on Machine Learning. PMLR, 37540–37570.

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).

Yuan Yao, Lorenzo Rosasco, and Andrea Caponnetto. 2007. On early stopping in gradient descent learning. Constructive approximation 26, 2 (2007), 289–315.

Yong Yu, Xiaosheng Si, Changhua Hu, and Jianxun Zhang. 2019. A review of recurrent neural networks: LSTM cells and network architectures. Neural computation 31, 7 (2019), 1235–1270.

Ting Zhang, Bowen Xu, Ferdian Thung, Stefanus Agus Haryono, David Lo, and Lingxiao Jiang. 2020. Sentiment analysis for software engineering: How far can pre-trained transformer models go?. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 70–80.