Cluster Feature Selection to improve Network Attack Detection

Abstract


Machine Learning (ML) based Intrusion Detection Systems (IDSs) have come as a key tool to detect malicious traffic and network attacks. However, these approaches still struggles to detect different and constantly improving attacks. In this context, among the required steps in ML based evaluation, the feature selection is of great importance because it is when the most important network features are picked and used by the algorithms to detect traffic anomalies or attack. However, knowing which one(s) could deliver a better attack detection remains an open issue. This paper aims to tackle this problem through a cluster-based feature selection approach in order to detect network attacks, as well as ranking the features that added to a high detection in each evaluated attack. Our approach outperformed all the other evaluated proposals for five different types of network attacks in terms of F1 score.

Keywords: Feature Selection, Cluster, Network Attack

References

Agrawal, R. (2014). K-Nearest Neighbor for Uncertain Data. International Journal of Computer Applications, volume 105, páginas 13-16(11).

Altman, N. S. (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician, volume 46, páginas 175-185(3).

Bisol, R., Silva, A., Machado, C., Granville, L., and Schaeffer-Filho, A. (2016). Coleta e Análise de Caracterı́sticas de Fluxo para Classificação de Tráfego em Redes Definidas por Software. XXXIV Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuı́dos.

Boutaba, R., Salahuddin, M. A., Limam, N., Ayoubi, S., Shahriar, N., Estrada-Solano, F., and Caicedo, O. M. (2018). A Comprehensive Survey on Machine Learning for Networking: Evolution, Applications and Research Opportunities. Journal of Internet Services and Applications, volume 46, página 16(1).

Guerra-Manzanares, A., Bahsi, H., and Nõmm, S. (2019). Hybrid Feature Selection Models for Machine Learning Based Botnet Detection in IoT Networks. In 2019 International Conference on Cyberworlds (CW), pages 324–327.

Guyon, I. and Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. The Journal of Machine Learning Research, volume 3, páginas 1157-1182.

Hall, M. A. (2000). Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, pages 359–366.

Haupt, R. L. and Haupt, S. E. (1998). Practical Genetic Algorithms. John Wiley & Sons, Inc., New York, NY, USA.

Ho, T. K. (1995). Random Decision Forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1), ICDAR ’95, pages 278–, Washington, DC, USA. IEEE Computer Society.

Hyalika, H. (2019). Understanding Principal Components Analysis (PCA). https://medium.com/datadriveninvestor/principal-components-analysis-pca-71cc9d43d9fb, Dezembro 2019.

Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

Karegowda, A. G., Manjunath, A., and Jayaram, M. (2010). Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection. International Journal of Information Technology and Knowledge Management, volume 2, páginas 271-277(2).

Kohavi, R. and John, G. H. (1997). Wrappers for Feature Subset Selection. Artificial Intelligence, volume 97, páginas 273-324.

Koroniotis, N., Moustafa, N., Sitnikova, E., and Turnbull, B. (2018). Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset. Future Generation Computer Systems, volume 100, páginas 779-796.

Kouiroukidis, N. and Evangelidis, G. (2011). The Effects of Dimensionality Curse in High Dimensional knn Search. In Informatics (PCI), 2011 15th Panhellenic Conference on, pages 41–45.

Kullback, S. and Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, volume 22, páginas 79-86(1).

Lopez, M. A., Lobato, A., Mattos, D., Alvarenga, I., Duarte, O., and Pujolle, G. (2017). Um algoritmo não supervisionado e rápido para seleção de caracterı́sticas em classificação de tráfego.

Ni, C., Liu, W.-S., Chen, X., Gu, Q., Chen, D.-X., and Huang, Q.-G. (2017). A cluster based feature selection method for cross-project software defect prediction. Journal of Computer Science and Technology, 32(6):1090–1107.

Okfalisa, Gazalba, I., Mustakim, and Reza, N. G. I. (2017). Comparative Analysis of K-nearest Neighbor and Modified K-nearest Neighbor Algorithm for Data Classification. In 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pages 294–298.

Opitz, D. and Maclin, R. (1999). Popular Ensemble Methods: An Empirical Study. Journal of Artificial Intelligence Research, volume 11, páginas 169-198(1).

Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, volume 1, páginas 81-106(1).

Salzberg, S. L. (1994). C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Machine Learning, volume 16, páginas 235-240(3).

Zander, S., Nguyen, T., and Armitage, G. (2005). Automated Traffic Classification and Application Identification Using Machine Learning. In The IEEE Conference on Local Computer Networks, 30th Anniversary., volume 1,páginas 250-257. IEEE.

Zuech, R. and Khoshgoftaar, T. (2015). A Survey on Feature Selection for Intrusion Detection. 21st ISSAT International Conference on Reliability and Quality in Design, páginas 150-155.
Published
2020-12-07
DE ABREU, Diego Medeiros; CARVALHO, Igor Furtado; ABELÉM, Antônio Jorge Gomes; MENASCHÉ, Daniel Sadoc; LEÃO, Rosa Maria Meri; SILVA, Edmundo Souza. Cluster Feature Selection to improve Network Attack Detection. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 38. , 2020, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 295-308. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2020.12290.

Most read articles by the same author(s)