Analysis of Contributions at a Software Institute through the Introduction of a Pre-Trained Model for Requirements Classification

Flávia Oliveira; Alay Nascimento; Ana Paula Silva; Leonardo Tiago; Lennon Chaves

doi:10.5753/sbqs.2025.13598

Flávia Oliveira Sidia Institute of Science and Technology
Alay Nascimento Sidia Institute of Science and Technology
Ana Paula Silva Sidia Institute of Science and Technology
Leonardo Tiago Sidia Institute of Science and Technology
Lennon Chaves Sidia Institute of Science and Technology

DOI: https://doi.org/10.5753/sbqs.2025.13598

Resumo

Context: Ensuring that requirements are adequately covered in test cases represents a challenge in the software industry. Specifically, a Software Institute maintains a testing team that continuously analyzes the requirements to ensure their implementation in test cases. Problem: However, requirements analysis is a human-dependent process and faces a large volume of requirements received by the testing team. In addition, other activities compete with requirements analysis, requiring effort and allocation to ensure that a requirement is analyzed and incorporated into the test case. Goal: In order to automate the requirements analysis process, we developed a tool based on XLNet, a pre-trained model for classifying and determining if the requirement is part of the scope of the testing team. Method: To evaluate this tool, we conducted a study with a team of 4 members who analyze requirements, in which the participants conducted the requirements analysis manually and with the aid of the tool. The study consisted of two analyses: (1) quantitative, aimed at evaluating effectiveness (correctly classified requirements) and efficiency (requirements analysis time), and (2) qualitative, in which we developed a questionnaire to obtain feedback from the participants on the use of the tool. Results: In quantitative terms, the statistical tests indicated that there was no significant difference in terms of efficiency between manual and automated classification, with a p-value of 0.8824. Regarding effectiveness, a p-value of 0.0177 was obtained, however, the results showed that manual sorting is still more effective than tool-assisted sorting. Despite this, the qualitative results showed that 100% of the participants agreed that using the tool could improve their performance in the requirements analysis activity and identified positive points about its use, such as accuracy, speed of analysis, and a reduction in the effort dedicated to this activity. Conclusions: The results show that using the tool can bring benefits by automating the analysis and classification of requirements.

Palavras-chave: Requirements Classification, Pre-trained Models, Software Testing

Referências

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).

Waad Alhoshan, Alessio Ferrari, and Liping Zhao. 2023. Zero-shot learning for requirements classification: An exploratory study. Information and Software Technology 159 (2023), 107202.

RICARDO CORREIA BARROS. 2018. A Importância da Gestão de Requisitos para Projetos de Desenvolvimento de Software. Ph.D. Dissertation. BS Thesis, Campus Sao Paulo (IFSP), Instituto Federal de Educaçao, Ciência e . . . .

Victor R Basili. 1994. Goal, question, metric paradigm. Encyclopedia of software engineering 1 (1994), 528–532.

Elizabeth Bjarnason, Per Runeson, Markus Borg, Michael Unterkalmsteiner, Emelie Engström, Björn Regnell, Giedre Sabaliauskaite, Annabella Loconsole, Tony Gorschek, and Robert Feldt. 2014. Challenges and practices in aligning requirements with verification and validation: a case study of six companies. Empirical software engineering 19 (2014), 1809–1855.

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2020. LEGAL-BERT: The muppets straight out of law school. arXiv preprint arXiv:2010.02559 (2020).

Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, and William Yang Wang. 2024. A survey on large language models for critical societal domains: Finance, healthcare, and law. arXiv preprint arXiv:2405.01769 (2024).

Betty H.C. Cheng and Joanne M. Atlee. 2007. Research Directions in Requirements Engineering. In Future of Software Engineering (FOSE ’07). 285–303. DOI: 10.1109/FOSE.2007.17

Ha Na Cho, Tae Joon Jun, Young-Hak Kim, Heejun Kang, Imjin Ahn, Hansle Gwon, Yunha Kim, Jiahn Seo, Heejung Choi, Minkyoung Kim, et al. 2024. Task-Specific Transformer-Based Language Models in Health Care: Scoping Review. JMIR Medical Informatics 12 (2024), e49724.

Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).

Aline Gomes Cordeiro and André Luís Policani Freitas. 2011. Priorização de requisitos e avaliação da qualidade de software segundo a percepção dos usuários. Ciência da Informação 40, 2 (2011).

Gabriela Oliveira da Trindade and Márcia Lucena. 2016. Rastreabilidade de Requisitos em Metodologias Ágeis: um Estudo Exploratório. In Simpósio Brasileiro de Sistemas de Informação (SBSI). SBC, 478–485.

Eliane Maria De Bortoli Fávero and Dalcimar Casanova. 2021. BERT_SE: A Pretrained Language Representation Model for Software Engineering. arXiv e-prints (2021), arXiv–2112.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171–4186.

Asmaa H Elsaid, Rashed K Salem, and Hatem M Abdul-kader. 2015. Automatic framework for requirement analysis phase. In 2015 Tenth International Conference on Computer Engineering & Systems (ICCES). IEEE, 197–203.

Daffa Hilmy Fadhlurrohman, Mira Kania Sabariah, Muhammad Johan Alibasa, and Jati Hiliamsyah Husen. 2023. Naive Bayes Classification Model for Precondition-Postcondition in Software Requirements. In 2023 International Conference on Data Science and Its Applications (ICoDSA). IEEE, 123–128.

Irit Hadar, Anna Zamansky, and DanielMBerry. 2019. The inconsistency between theory and practice in managing inconsistency in requirements engineering. Empirical Software Engineering 24, 6 (2019), 3972–4005.

Ines Hajri, Arda Goknil, Fabrizio Pastore, and Lionel C Briand. 2020. Automating system test case classification and prioritization for use case-driven testing in product lines. Empirical Software Engineering 25 (2020), 3711–3769.

Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Yuan Yao, Ao Zhang, Liang Zhang, et al. 2021. Pre-trained models: Past, present and future. AI Open 2 (2021), 225–250.

Elisabeth Henkel, Nico Hauff, Lena Funk, Vincent Langenfeld, and Andreas Podelski. 2024. Scalable Redundancy Detection for Real-Time Requirements. In 2024 IEEE 32nd International Requirements Engineering Conference (RE). IEEE, 193–204.

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International conference on machine learning. PMLR, 2790–2799.

Shenson Joseph and Herat Joshi. 2024. ULMFiT: Universal Language Model Fine-Tuning for Text Classification. International Journal of Advanced Medical Sciences and Technology (IJAMST) 4 (2024).

Marcos Kalinowski, Michael Felderer, Tayana Conte, Rodrigo Spínola, Rafael Prikladnicki, Dietmar Winkler, Daniel Méndez Fernández, and Stefan Wagner. 2016. Preventing incomplete/hidden requirements: reflections on survey data from Austria and Brazil. In Software Quality. The Future of Systems-and Software Development: 8th International Conference, SWQD 2016, Vienna, Austria, January 18-21, 2016, Proceedings 8. Springer, 63–78.

Erik Kamsties. 2005. Understanding ambiguity in requirements engineering. Engineering and Managing Software Requirements (2005), 245–266.

Muhammad Amin Khan, Muhammad Sohail Khan, Inayat Khan, Shafiq Ahmad, and Shamsul Huda. 2023. Non functional requirements identification and classification using transfer learning model. IEEE Access 11 (2023), 74997–75005.

Derya Kici, Aysun Bozanta, Mucahit Cevik, Devang Parikh, and Ayşe Başar. 2021. Text classification on software requirements specifications using transformer models. In Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering. 163–172.

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. 2023. Recent advances in natural language processing via large pre-trained language models: A survey. Comput. Surveys 56, 2 (2023), 1–40.

Glenford J Myers, Corey Sandler, and Tom Badgett. 2011. The art of software testing. John Wiley & Sons.

Alay Nascimento, Flávia Oliveira, Leonardo Tiago, and Lennon Chaves. 2025. Who Should Test the Requirement? A Comparative Study on Requirements Classification for Assigning Test Teams using the Pre-Trained Models. In Anais do XXXIX Simpósio Brasileiro de Engenharia de Software (Recife/PE). SBC, Porto Alegre, RS, Brasil, 671–677. DOI: 10.5753/sbes.2025.11009

Roger S Pressman. 2005. Software engineering: a practitioner’s approach. Palgrave macmillan.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research 21, 140 (2020), 1–67.

Patrick Rempel and Parick Mäder. 2016. Preventing defects: The impact of requirements traceability completeness on software quality. IEEE Transactions on Software Engineering 43, 8 (2016), 777–797.

Steven J Rigatti. 2017. Random forest. Journal of Insurance Medicine 47, 1 (2017), 31–39.

Irina Rish et al. 2001. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, Vol. 3. Seattle, USA, 41–46.

Luiz Gustavo Mendes Rodrigues. 2006. Um modelo de avaliação dos requisitos no processo de desenvolvimento de software. Ph.D. Dissertation. Tese de Doutorado, Universidade Estadual de Campinas, Instituto de Computação.

Sebastian Ruder, Matthew E Peters, Swabha Swayamdipta, and Thomas Wolf. 2019. Transfer learning in natural language processing. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Tutorials. 15–18.

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).

Miriam Sayão and Julio Cesar Sampaio do Prado Leite. 2006. Rastreabilidade de requisitos. RITA 13, 1 (2006), 57–86.

Ahmad F Subahi. 2023. Bert-based approach for greening software requirements engineering through non-functional requirements. IEEE Access 11 (2023), 103001–103013.

Chetan Surana Rajender Kumar Surana, Dipesh B Gupta, Sahana P Shankar, et al. 2019. Intelligent chatbot for requirements elicitation and classification. In 2019 4th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT). IEEE, 866–870.

Rosalia Tufano, Luca Pascarella, and Gabriele Bavota. 2023. Automating coderelated tasks through transformers: The impact of pre-training. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2425–2437.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

Julian von der Mosel, Alexander Trautsch, and Steffen Herbold. 2023. On the Validity of Pre-Trained Transformers for Natural Language Processing in the Software Engineering Domain. IEEE Transactions on Software Engineering 49, 4 (2023), 1487–1507. DOI: 10.1109/TSE.2022.3178469

Dandan Wang and Shiqing Zhang. 2024. Large language models in medical and healthcare fields: applications, advances, and challenges. Artificial Intelligence Review 57, 11 (2024), 299.

Haifeng Wang, Jiwei Li, Hua Wu, Eduard Hovy, and Yu Sun. 2023. Pre-trained language models and their applications. Engineering 25 (2023), 51–65.

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).

Zhenzhen Yang, Rubing Huang, Chenhui Cui, Nan Niu, and Dave Towey. 2025. Requirements-Based Test Generation: A Comprehensive Survey. arXiv preprint arXiv:2505.02015 (2025).