Use of Adjusted Mutual Information in Feature Selection in an Intrusion Detection Database
Abstract
Intrusion Detection Systems (IDSs) are essential for monitoring networks and identifying anomalous behavior. This paper applies a feature selection technique to extract the most representative characteristics from the NSL-KDD dataset, using a hybrid approach that combines the Information Gain Ratio and the K-means algorithm. The Adjusted Mutual Information (AMI) metric was employed to define the optimal subset of attributes. With this technique, it was possible to reduce the dimensionality from 41 to 7 attributes, achieving 70% accuracy, demonstrating the effectiveness of the proposed approach.
Keywords:
intrusion detection, feature selection, adjusted mutual information, NSL-KDD, machine learning
References
Alessia Amelio, C. P. (2016). Correction for closeness: Adjusting normalized mutual information measure for clustering comparison.
Araújo, N., de Oliveira, R., Ferreira, E., Shinoda, A. A., and Bhargava, B. (2010). Identifying important characteristics in the kdd99 intrusion detection dataset by feature selection using a hybrid approach. In 2010 17th International Conference on Telecommunications, pages 552–558.
Kayacik, H. G., Zincir-Heywood, A. N., and Heywood, M. I. (2005). Selecting features for intrusion detection: A feature relevance analysis on kdd 99 intrusion detection datasets. In Proceedings of the third annual conference on privacy, security and trust, volume 94, pages 1723–1722. Citeseer.
Kurniabudi, K., Stiawan, D., Dr, D., Idris, M., Bamhdi, A., and Budiarto, R. (2020). Cicids-2017 dataset feature analysis with information gain for anomaly detection. IEEE Access, PP:1–1.
Lazarenko, D. and Bonald, T. (2021). Pairwise adjusted mutual information. CoRR, abs/2103.12641.
Lippmann, R., Haines, J. W., Fried, D. J., Korba, J., and Das, K. (2000). The 1999 darpa off-line intrusion detection evaluation. Computer networks, 34(4):579–595.
Liu, H. and Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences, 9(20).
Zhang, X. and Liu, C.-A. (2023). Model averaging prediction by k-fold cross-validation. Journal of Econometrics, 235(1):280–301.
Araújo, N., de Oliveira, R., Ferreira, E., Shinoda, A. A., and Bhargava, B. (2010). Identifying important characteristics in the kdd99 intrusion detection dataset by feature selection using a hybrid approach. In 2010 17th International Conference on Telecommunications, pages 552–558.
Kayacik, H. G., Zincir-Heywood, A. N., and Heywood, M. I. (2005). Selecting features for intrusion detection: A feature relevance analysis on kdd 99 intrusion detection datasets. In Proceedings of the third annual conference on privacy, security and trust, volume 94, pages 1723–1722. Citeseer.
Kurniabudi, K., Stiawan, D., Dr, D., Idris, M., Bamhdi, A., and Budiarto, R. (2020). Cicids-2017 dataset feature analysis with information gain for anomaly detection. IEEE Access, PP:1–1.
Lazarenko, D. and Bonald, T. (2021). Pairwise adjusted mutual information. CoRR, abs/2103.12641.
Lippmann, R., Haines, J. W., Fried, D. J., Korba, J., and Das, K. (2000). The 1999 darpa off-line intrusion detection evaluation. Computer networks, 34(4):579–595.
Liu, H. and Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences, 9(20).
Zhang, X. and Liu, C.-A. (2023). Model averaging prediction by k-fold cross-validation. Journal of Econometrics, 235(1):280–301.
Published
2024-12-05
How to Cite
MARTINS, Luiz E. R.; ARAÚJO, Nelcileno Virgílio de Souza; DE OLIVEIRA, Allan G.; EUGÊNIO, Letízia Manuella Serqueira.
Use of Adjusted Mutual Information in Feature Selection in an Intrusion Detection Database. In: REGIONAL SCHOOL ON INFORMATICS OF GOIÁS (ERI-GO), 12. , 2024, Ceres/GO.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 231-234.
DOI: https://doi.org/10.5753/erigo.2024.4828.
