Seleção de Características com Alta Quantidade de Informação para Sistemas de Detecção de Intrusão baseada no Conjunto de Dominância de Pareto

  • Guilherme Nunes Nasseh Barbosa UFF
  • Diogo Menezes Ferrazani Mattos UFF

Abstract


The COVID-19 pandemic has driven a change in the profile of Internet use, fostering an increase in attacks and new threats to institutions, which until then had been little targeted. In this new scenario, threat detection and prevention tools tend to be replaced by machine learning-based solutions that require efficient execution. This article proposes an efficient method for feature selection for machine learning using the Pareto frontier. The proposal minimizes the Pearson correlation and the Mutual Information between pairs of selected features. The selected dominant features were applied to three machine-learning models for classifying malicious streams. The proposed method was efficient compared to other methods, as it allows using fewer features to achieve similar accuracy, precision, and recall values, reducing training and validation time.

References

Abdollahzadeh, B. e Gharehchopogh, F. S. (2022). A multi-objective optimization algorithm for feature selection problems. Engineering with Computers, 38(3):1845-1863.

Andreoni Lopez, M., Mattos, D. M. F., Duarte, O. C. M. B. e Pujolle, G. (2019). A fast unsupervised preprocessing method for network monitoring. Annals of Telecommunications, 74(3):139-155.

Andreoni Lopez, M., Sanz, I. J. e Lobato, A. G. P. (2018). Aprendizado de máquina em plataformas de processamento distribuído de fluxo: Análise e detecção de ameaças em tempo real. Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC) Minicursos.

Arifeen, M., Petrovski, A. e Petrovski, S. (2021). Automated microsegmentation for lateral movement prevention in industrial internet of things (iiot). Em 2021 14th International Conference on Security of Information and Networks (SIN).

Di Mauro, M., Galatro, G., Fortino, G. e Liotta, A. (2021). Supervised feature selection techniques in network intrusion detection: A critical review. Engineering Applications of Artificial Intelligence, 101:104216.

Farrugia, S., Ellul, J. e Azzopardi, G. (2020). Detection of illicit accounts over the ethereum blockchain. Expert Systems with Applications, 150:113318.

Garg, S., Kaur, K., Kumar, N., Kaddoum, G., Zomaya, A. Y. e Ranjan, R. (2019). A hybrid deep learning-based model for anomaly detection in cloud datacenter networks. IEEE Transactions on Network and Service Management, 16(3):924-935.

Kasongo, S. M. e Sun, Y. (2019). A deep learning method with filter based feature engineering for wireless intrusion detection system. IEEE Access, 7:38597-38607.

Kim, T.-Y. e Cho, S.-B. (2018). Web traffic anomaly detection using c-lstm neural networks. Expert Systems with Applications, 106:66-76.

Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J. e Liu, H. (2017). Feature selection: A data perspective. ACM Comput. Surv., 50(6).

Lopez, M. A., Silva, R. S., Alvarenga, I. D., Rebello, G. A. F., Sanz, I. J., Lobato, A. G. P., Mattos, D. M. F., Duarte, O. C. M. B. e Pujolle, G. (2017). Collecting and characterizing a real broadband access network traffic dataset. Em 2017 1st Cyber Security in Networking Conference (CSNet), p. 1-8.

Ma, Q., Sun, C., Cui, B. e Jin, X. (2021). A novel model for anomaly detection in network traffic based on kernel support vector machine. Computers & Security, 104:102215.

Matin, I. M. M. e Rahardjo, B. (2019). Malware detection using honeypot and machine learning. Em 2019 7th International Conference on Cyber and IT Service Management (CITSM), volume 7, p. 1-4.

Medeiros, D., Cunha Neto, H., Andreoni, M., Magalhães, L., Silva, E., Borges, A., Fernandes, N. e Menezes, D. (2019). Análise de Dados em Redes Sem Fio de Grande Porte: Processamento em Fluxo em Tempo Real, Tendências e Desafios, p. 142-195.

Silva, J. V. V., de Oliveira, N. R., Medeiros, D. S., Lopez, M. A. e Mattos, D. M. (2022). A statistical analysis of intrinsic bias of network security datasets for training machine learning mechanisms. Annals of Telecommunications, p. 1-17.

Thakkar, A. e Lohiya, R. (2021). Attack classification using feature selection techniques: a comparative study. Journal of Ambient Intelligence and Humanized Computing, 12(1):1249-1266.

Viduto, V., Maple, C., Huang, W. e López-Peréz, D. (2012). A novel risk assessment and optimisation model for a multi-objective network security countermeasure selection problem. Decision Support Systems, 53(3):599-610.

Wang, W., Liang, C., Chen, Q., Tang, L., Yanikomeroglu, H. e Liu, T. (2022). Distributed online anomaly detection for virtualized network slicing environment. IEEE Transactions on Vehicular Technology.
Published
2023-05-22
BARBOSA, Guilherme Nunes Nasseh; MATTOS, Diogo Menezes Ferrazani. Seleção de Características com Alta Quantidade de Informação para Sistemas de Detecção de Intrusão baseada no Conjunto de Dominância de Pareto. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 41. , 2023, Brasília/DF. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 169-182. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2023.546.