Security Vulnerability Risk Classification via Gaussian Processes and Active Learning
Abstract
Effective vulnerability management is essential for cybersecurity, but the lack of skilled professionals makes this task challenging. Expert data labeling, in conjunction with machine learning techniques, seeks to obtain models capable of emulating the experience of security professionals. This paper investigates the feasibility of using Gaussian Processes (GPs) with Active Learning to classify security vulnerabilities according to their risk of exploitation. The aim is to reduce the labeled data required for an effective classifier. The proposed methodology combines the uncertainties in predictions provided by GP models with five data selection strategies for labeling available in the literature. The experiments used the recently published CVEjoin data set, which contains information about more than 200,000 vulnerabilities. Three evaluation scenarios are considered, all with the same amount of labeled data but different amounts of Active Learning iterations. The BSB strategy performed best in accuracy and F1 score, especially with more labeling iterations.References
Alshaya, F. A., S. S. Alqahtani e Y. A. Alsamel (2023). “VrT: A CWE-Based Vulnerability Report Tagger: Machine Learning Driven Cybersecurity Tool for Vulnerability Classification”. Em: 2023 IEEE/ACM 1st International Workshop on Software Vulnerability (SVM). IEEE, pp. 10–13.
Elbaz, C., L. Rilling e C. Morin (2021). “Automated risk analysis of a vulnerability disclosure using active learning”. Em: C&ESAR 2021-28th Computer & Electronics Security Application Rendezvous, pp. 1–19.
Firoiu, M. (2015). “General Considerations on Risk Management and Information System Security Assessment According to ISO/IEC 27005: 2011 and ISO 31000: 2009 Standards.” Em: Quality-Access to Success 16.149.
Foreman, P. (2019). Vulnerability management. Auerbach Publications. Garnett, R. (2023). Bayesian optimization. Cambridge University Press.
Géron, A. (2019). Mãos à obra: aprendizado de máquina com Scikit-Learn & TensorFlow.
Alta Books. Hensman, J., N. Fusi e N. D. Lawrence (2013). “Gaussian Processes for Big Data”. Em: Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013. AUAI Press.
Hensman, J., A. Matthews e Z. Ghahramani (2015). “Scalable variational Gaussian process classification”. Em: Artificial Intelligence and Statistics. PMLR, pp. 351–360.
Hore, S., A. Shah e N. D. Bastian (2023). “Deep VULMAN: A deep reinforcement learning-enabled cyber vulnerability management framework”. Em: Expert Systems with Applications 221, p. 119734.
Jakkal, V. (fev. de 2022). Cybersecurity threats are always changing—staying on top of them is vital, getting ahead of them is paramount. Microsoft Security Blog. URL: [link].
Joshi, A. J., F. Porikli e N. Papanikolopoulos (2009). “Multi-class active learning for image classification”. Em: 2009 ieee conference on computer vision and pattern recognition. IEEE, pp. 2372–2379.
Kashyap, A., A. Chakravarthy e P. P. Menon (2022). “Detection of Cyber-Attacks in Automotive Traffic Using Macroscopic Models and Gaussian Processes”. Em: IEEE Control Systems Letters 6, pp. 1688–1693.
Kure, H. I. et al. (2022). “Asset criticality and risk prediction for an effective cybersecurity risk management of cyber-physical system”. Em: Neural Computing and Applications 34.1, pp. 493–514.
Pereira-Santos, D., R. B. C. Prudêncio e A. C. de Carvalho (2019). “Empirical investigation of active learning strategies”. Em: Neurocomputing 326, pp. 15–27.
Ponte, F. R. da, E. B. Rodrigues e C. L. Mattos (2023a). “A Vulnerability Risk Assessment Methodology Using Active Learning”. Em: Advanced Information Networking and Applications: Proceedings of the 37th International Conference on Advanced Information Networking and Applications (AINA-2023), Volume 2. Springer, pp. 171–182.
Rasmussen, C. E. e C. K. I. Williams (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, pp. I–XVIII, 1–248. ISBN: 026218253X.
Ross, R. S. (2012). Guide for Conducting Risk Assessments. Special Publication 800-30 Rev. 1. Retrieved from [link]. National Institute of Standards e Technology.
Sabottke, C., O. Suciu e T. Dumitras, (2015). “Vulnerability disclosure in the age of social media: Exploiting twitter for predicting {Real-World} exploits”. Em: 24th USENIX Security Symposium (USENIX Security 15), pp. 1041–1056.
Sun, X. et al. (2023). “ASSBert: Active and semi-supervised bert for smart contract vulnerability detection”. Em: Journal of Information Security and Applications 73, p. 103423. ISSN: 2214-2126.
Swiler, L. P. et al. (2020). “A survey of constrained Gaussian process regression: Approaches and implementation challenges”. Em: Journal of Machine Learning for Modeling and Computing 1.2.
Tenable (2023). Três desafios reais enfrentados pelas organizações de segurança cibernética. Retrieved from [link].
Williams, C. K. e C. E. Rasmussen (2006). Gaussian processes for machine learning. Vol. 2. 3. MIT press Cambridge, MA.
Elbaz, C., L. Rilling e C. Morin (2021). “Automated risk analysis of a vulnerability disclosure using active learning”. Em: C&ESAR 2021-28th Computer & Electronics Security Application Rendezvous, pp. 1–19.
Firoiu, M. (2015). “General Considerations on Risk Management and Information System Security Assessment According to ISO/IEC 27005: 2011 and ISO 31000: 2009 Standards.” Em: Quality-Access to Success 16.149.
Foreman, P. (2019). Vulnerability management. Auerbach Publications. Garnett, R. (2023). Bayesian optimization. Cambridge University Press.
Géron, A. (2019). Mãos à obra: aprendizado de máquina com Scikit-Learn & TensorFlow.
Alta Books. Hensman, J., N. Fusi e N. D. Lawrence (2013). “Gaussian Processes for Big Data”. Em: Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013. AUAI Press.
Hensman, J., A. Matthews e Z. Ghahramani (2015). “Scalable variational Gaussian process classification”. Em: Artificial Intelligence and Statistics. PMLR, pp. 351–360.
Hore, S., A. Shah e N. D. Bastian (2023). “Deep VULMAN: A deep reinforcement learning-enabled cyber vulnerability management framework”. Em: Expert Systems with Applications 221, p. 119734.
Jakkal, V. (fev. de 2022). Cybersecurity threats are always changing—staying on top of them is vital, getting ahead of them is paramount. Microsoft Security Blog. URL: [link].
Joshi, A. J., F. Porikli e N. Papanikolopoulos (2009). “Multi-class active learning for image classification”. Em: 2009 ieee conference on computer vision and pattern recognition. IEEE, pp. 2372–2379.
Kashyap, A., A. Chakravarthy e P. P. Menon (2022). “Detection of Cyber-Attacks in Automotive Traffic Using Macroscopic Models and Gaussian Processes”. Em: IEEE Control Systems Letters 6, pp. 1688–1693.
Kure, H. I. et al. (2022). “Asset criticality and risk prediction for an effective cybersecurity risk management of cyber-physical system”. Em: Neural Computing and Applications 34.1, pp. 493–514.
Pereira-Santos, D., R. B. C. Prudêncio e A. C. de Carvalho (2019). “Empirical investigation of active learning strategies”. Em: Neurocomputing 326, pp. 15–27.
Ponte, F. R. da, E. B. Rodrigues e C. L. Mattos (2023a). “A Vulnerability Risk Assessment Methodology Using Active Learning”. Em: Advanced Information Networking and Applications: Proceedings of the 37th International Conference on Advanced Information Networking and Applications (AINA-2023), Volume 2. Springer, pp. 171–182.
Rasmussen, C. E. e C. K. I. Williams (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, pp. I–XVIII, 1–248. ISBN: 026218253X.
Ross, R. S. (2012). Guide for Conducting Risk Assessments. Special Publication 800-30 Rev. 1. Retrieved from [link]. National Institute of Standards e Technology.
Sabottke, C., O. Suciu e T. Dumitras, (2015). “Vulnerability disclosure in the age of social media: Exploiting twitter for predicting {Real-World} exploits”. Em: 24th USENIX Security Symposium (USENIX Security 15), pp. 1041–1056.
Sun, X. et al. (2023). “ASSBert: Active and semi-supervised bert for smart contract vulnerability detection”. Em: Journal of Information Security and Applications 73, p. 103423. ISSN: 2214-2126.
Swiler, L. P. et al. (2020). “A survey of constrained Gaussian process regression: Approaches and implementation challenges”. Em: Journal of Machine Learning for Modeling and Computing 1.2.
Tenable (2023). Três desafios reais enfrentados pelas organizações de segurança cibernética. Retrieved from [link].
Williams, C. K. e C. E. Rasmussen (2006). Gaussian processes for machine learning. Vol. 2. 3. MIT press Cambridge, MA.
Published
2024-09-16
How to Cite
RIBEIRO, Davyson S.; LEMOS, Rafael; PONTE, Francisco R. P. da; MATTOS, César Lincoln C.; RODRIGUES, Emanuel B..
Security Vulnerability Risk Classification via Gaussian Processes and Active Learning. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 24. , 2024, São José dos Campos/SP.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 107-122.
DOI: https://doi.org/10.5753/sbseg.2024.241782.
