Evaluating the Potential of Large Language Models in Security-Related Software Requirements Classification

Resumo


An effective classification of security-related software requirements is crucial to mitigate potential threats and ensure robust system design. This study investigates the performance of Large Language Models (LLMs) in classifying security-related requirements compared to traditional Machine Learning (ML) methods. Using the SecReq, DOSSPRE and PROMISE+ datasets, we evaluated ten LLMs across various prompt engineering strategies. The results demonstrate that LLMs achieve high accuracy and outperform traditional ML-based models in several evaluation scenarios and that prompt engineering can significantly enhance the model’s ability to identify security-related requirements. This work underscores the domaingeneralization capabilities of LLMs and their potential to streamline requirements classification without the complexity of feature engineering or dataset-specific fine-tuning often required by ML-based approaches. Researchers, practitioners, and tool developers can leverage these findings to advance automated approaches in security requirements engineering.

Palavras-chave: Requirements Engineering, Large Language Models, Security, Non-Functional Requirements

Referências

Ana I Anton. 1997. Goal identification and refinement in the specification of software-based information systems. Georgia Institute of Technology.

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

Benjamin Clavié, Alexandru Ciceu, Frederick Naylor, Guillaume Soulié, and Thomas Brightwell. 2023. Large language models in the workplace: A case study on prompt engineering for job type classification. In International Conference on Applications of Natural Language to Information Systems. Springer, 3–17.

Jane Cleland-Huang, Raffaella Settimi, Xuchang Zou, and Peter Solc. 2006. The Detection and Classification of Non-Functional Requirements with Application to Early Aspects. In 14th IEEE International Requirements Engineering Conference (RE’06). 39–48. DOI: 10.1109/RE.2006.65

Jane Cleland-Huang, Raffaella Settimi, Xuchang Zou, and Peter Solc. 2007. Automated Classification of Non-Functional Requirements. Requirements Engineering 12, 2 (2007), 103–120. DOI: 10.1007/s00766-007-0045-1

Alan M Davis. 1993. Software requirements: objects, functions, and states. Prentice-Hall, Inc.

Saad Ezzini, Sallam Abualhaija, Chetan Arora, and Mehrdad Sabetzadeh. 2022. Automated handling of anaphoric ambiguity in requirements: a multi-solution study. In Proceedings of the 44th International Conference on Software Engineering. 187–199.

Martin Glinz. 2007. On non-functional requirements. In 15th IEEE international requirements engineering conference (RE 2007). IEEE, 21–26.

Tobias Hey, Jan Keim, Anne Koziolek, andWalter F Tichy. 2020. Norbert: Transfer learning for requirements classification. In 2020 IEEE 28th international requirements engineering conference (RE). IEEE, 169–179.

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2023. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology (2023).

Siv Hilde Houmb, Shareeful Islam, Eric Knauss, Jan Jürjens, and Kurt Schneider. 2010. Eliciting security requirements and tracing them to design: an integration of Common Criteria, heuristics, and UMLsec. Requirements Engineering 15 (2010), 63–93.

1990. IEEE Standard Glossary of Software Engineering Terminology. IEEE Std 610.12-1990 (1990), 1–84. DOI: 10.1109/IEEESTD.1990.101064

Prudence Kadebu, Sunil Sikka, Rajesh Kumar Tyagi, and Panashe Chiurunge. 2023. A classification approach for software requirements towards maintainable security. Scientific African 19 (2023), e01496.

Haeng-Kon Kim and Youn-Ky Chung. 2005. Automatic translation form requirements model into use cases modeling on UML. In Computational Science and Its Applications–ICCSA 2005: International Conference, Singapore, May 9-12, 2005, Proceedings, Part III 5. Springer, 769–777.

Eric Knauss, Siv Houmb, Kurt Schneider, Shareeful Islam, and Jan Jürjens. 2011. Supporting requirements engineers in recognising security issues. In Requirements Engineering: Foundation for Software Quality: 17th International Working Conference, REFSQ 2011, Essen, Germany, March 28-30, 2011. Proceedings 17. Springer, 4–18.

Armin Kobilica, Mohammed Ayub, and Jameleddine Hassine. 2020. Automated Identification of Security Requirements: A Machine Learning Approach. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering (Trondheim, Norway) (EASE ’20). Association for Computing Machinery, New York, NY, USA, 475–480. DOI: 10.1145/3383219.3383288

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, EnzhiWang, and Xiaohang Dong. 2023. Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702 (2023).

Gerald Kotonya and Ian Sommerville. 1998. Requirements engineering: processes and techniques. Wiley Publishing.

Tong Li. 2017. Identifying security requirements based on linguistic analysis and machine learning. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 388–397.

Márcia Lima, Victor Valle, Estevão Costa, Fylype Lira, and Bruno Gadelha. 2019. Software Engineering Repositories: Expanding the PROMISE Database. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering (Salvador, Brazil) (SBES ’19). Association for Computing Machinery, New York, NY, USA, 427–436. DOI: 10.1145/3350768.3350776

Maria-Isabel Limaylla-Lunarejo, Nelly Condori-Fernandez, and Miguel R Luaces. 2023. Towards a FAIR Dataset for non-functional requirements. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing. 1414–1421.

Xianchang Luo, Yinxing Xue, Zhenchang Xing, and Jiamou Sun. 2022. Prcbert: Prompt learning for requirement classification using bert-based pretrained language models. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.

Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. 2024. SpecGen: Automated Generation of Formal Program Specifications via Large Language Models. arXiv preprint arXiv:2401.08807 (2024).

Wei Ma, Shangqing Liu, Zhihao Lin,WenhanWang, Qiang Hu, Ye Liu, Cen Zhang, Liming Nie, Li Li, and Yang Liu. 2023. LMs: Understanding Code Syntax and Semantics for Code Analysis. arXiv preprint arXiv:2305.12138 (2023).

Murilo Martin, Daniel Coutinho, Anderson Uchôa, and Juliana Alves Pereira. 2025. aisepucrio/llm-security-req-classification: CBSOFT - version v2. DOI: 10.5281/zenodo.17058472

Daniel Mellado, Carlos Blanco, Luis E Sánchez, and Eduardo Fernández-Medina. 2010. A systematic review of security requirements engineering. Computer Standards & Interfaces 32, 4 (2010), 153–165.

OpenAI. 2024. GPT-4o mini: advancing cost-efficient intelligence. Accessed: 2024-12-15.

Alec Radford, JeffreyWu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.

Mohaimenul Azam Khan Raiaan, Md Saddam Hossain Mukta, Kaniz Fatema, Nur Mohammad Fahad, Sadman Sakib, Most Marufatul Jannat Mim, Jubaer Ahmad, Mohammed Eunus Ali, and Sami Azam. 2024. A review on large Language Models: Architectures, applications, taxonomies, open issues and challenges. IEEE Access (2024).

Reuters. [n. d.]. OpenAI says ChatGPT’s weekly users have grown to 200 million. [link]. Accessed: 2024-12-27.

Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927 (2024).

Kurt Schneider, Eric Knauss, Siv Houmb, Shareeful Islam, and Jan Jürjens. 2012. Enhancing security requirements engineering by organizational learning. Requirements Engineering 17 (2012), 35–56.

Bruno Silva, Rodrigo Nascimento, Luis Rivero, Geraldo Braz, Rodrigo Santos, Luiz Martins, and Davi Viana. 2024. Promise+: expandindo a base de dados de requisitos de software Promise exp. Anais do XXXVIII Simpósio Brasileiro de Engenharia de Software (2024), 291–301.

Ian Sommerville. 2013. Engenharia de Software. Pearson.

Giriprasad Sridhara, Sourav Mazumdar, et al. 2023. Chatgpt: A study on its utility for ubiquitous software engineering tasks. arXiv preprint arXiv:2305.16837 (2023).

Yawen Wang, Lin Shi, Mingyang Li, Qing Wang, and Yun Yang. 2020. A deep context-wise method for coreference detection in natural language requirements. In 2020 IEEE 28th International Requirements Engineering Conference (RE). IEEE, 180–191.

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837.

Danning Xie, Byungwoo Yoo, Nan Jiang, Mijung Kim, Lin Tan, Xiangyu Zhang, and Judy S Lee. 2023. Impact of large language models on generating software specifications. arXiv preprint arXiv:2306.03324 (2023).

Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2022. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493 (2022).

Zhengping Zhou, Lezhi Li, Xinxi Chen, and Andy Li. 2023. Mini-Giants:" Small" Language Models and Open Source Win-Win. arXiv preprint arXiv:2307.08189 (2023).
Publicado
22/09/2025
MARTIN, Murilo; COUTINHO, Daniel; UCHÔA, Anderson; PEREIRA, Juliana Alves. Evaluating the Potential of Large Language Models in Security-Related Software Requirements Classification. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 39. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 315-325. ISSN 2833-0633. DOI: https://doi.org/10.5753/sbes.2025.9935.