Seleção de Características por Clusterização para Melhorar a Detecção de Ataques de Rede

Diego Medeiros de Abreu; Igor Furtado Carvalho; Antônio  Jorge Gomes Abelém; Daniel Sadoc Menasché; Rosa Maria Meri Leão; Edmundo Souza Silva

doi:10.5753/sbrc.2020.12290

Diego Medeiros de Abreu UFPA http://orcid.org/0000-0001-9221-7853
Igor Furtado Carvalho UFPA
Antônio Jorge Gomes Abelém UFPA http://orcid.org/0000-0003-4085-6674
Daniel Sadoc Menasché UFRJ http://orcid.org/0000-0002-8953-4003
Rosa Maria Meri Leão UFRJ http://orcid.org/0000-0001-6411-9252
Edmundo Souza Silva UFRJ http://orcid.org/0000-0003-0912-7860

DOI: https://doi.org/10.5753/sbrc.2020.12290

Resumo

Sistemas de Detecção de Intrusão (IDSs) baseados em aprendizado de máquina (AM) vêm sendo amplamente utilizados para detectar tráfego malicioso e ataques às redes. Entretanto, essas abordagens ainda apresentam grandes dificuldades para detectar os diferentes tipos de ataques que vêm se aprimorando. Neste contexto, dentre os passos requeridos para uma avaliação baseada em AM, a seleção de características tem grande importância para propiciar maior eficiência na detecção de anomalias e ataques de rede, sendo ainda um problema em aberto. Este artigo propõe uma abordagem que realiza a seleção de características baseada em clusters para melhorar a detecção de ataques e tráfegos anômalos na rede. A proposta cria também um ranque com as características de tráfego que mais contribuíram para o incremento nos acertos dos algoritmos. Os resultados mostraram um desempenho superior às demais propostas avaliadas para cinco diferentes tipos de ataques, considerando a métrica F1 score.

Palavras-chave: Seleção de Características, Clusterização, Ataques de Rede

Referências

Agrawal, R. (2014). K-Nearest Neighbor for Uncertain Data. International Journal of Computer Applications, volume 105, páginas 13-16(11).

Altman, N. S. (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician, volume 46, páginas 175-185(3).

Bisol, R., Silva, A., Machado, C., Granville, L., and Schaeffer-Filho, A. (2016). Coleta e Análise de Caracterı́sticas de Fluxo para Classificação de Tráfego em Redes Definidas por Software. XXXIV Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuı́dos.

Boutaba, R., Salahuddin, M. A., Limam, N., Ayoubi, S., Shahriar, N., Estrada-Solano, F., and Caicedo, O. M. (2018). A Comprehensive Survey on Machine Learning for Networking: Evolution, Applications and Research Opportunities. Journal of Internet Services and Applications, volume 46, página 16(1).

Guerra-Manzanares, A., Bahsi, H., and Nõmm, S. (2019). Hybrid Feature Selection Models for Machine Learning Based Botnet Detection in IoT Networks. In 2019 International Conference on Cyberworlds (CW), pages 324–327.

Guyon, I. and Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. The Journal of Machine Learning Research, volume 3, páginas 1157-1182.

Hall, M. A. (2000). Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, pages 359–366.

Haupt, R. L. and Haupt, S. E. (1998). Practical Genetic Algorithms. John Wiley & Sons, Inc., New York, NY, USA.

Ho, T. K. (1995). Random Decision Forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1), ICDAR ’95, pages 278–, Washington, DC, USA. IEEE Computer Society.

Hyalika, H. (2019). Understanding Principal Components Analysis (PCA). https://medium.com/datadriveninvestor/principal-components-analysis-pca-71cc9d43d9fb, Dezembro 2019.

Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

Karegowda, A. G., Manjunath, A., and Jayaram, M. (2010). Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection. International Journal of Information Technology and Knowledge Management, volume 2, páginas 271-277(2).

Kohavi, R. and John, G. H. (1997). Wrappers for Feature Subset Selection. Artificial Intelligence, volume 97, páginas 273-324.

Koroniotis, N., Moustafa, N., Sitnikova, E., and Turnbull, B. (2018). Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset. Future Generation Computer Systems, volume 100, páginas 779-796.

Kouiroukidis, N. and Evangelidis, G. (2011). The Effects of Dimensionality Curse in High Dimensional knn Search. In Informatics (PCI), 2011 15th Panhellenic Conference on, pages 41–45.

Kullback, S. and Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, volume 22, páginas 79-86(1).

Lopez, M. A., Lobato, A., Mattos, D., Alvarenga, I., Duarte, O., and Pujolle, G. (2017). Um algoritmo não supervisionado e rápido para seleção de caracterı́sticas em classificação de tráfego.

Ni, C., Liu, W.-S., Chen, X., Gu, Q., Chen, D.-X., and Huang, Q.-G. (2017). A cluster based feature selection method for cross-project software defect prediction. Journal of Computer Science and Technology, 32(6):1090–1107.

Okfalisa, Gazalba, I., Mustakim, and Reza, N. G. I. (2017). Comparative Analysis of K-nearest Neighbor and Modified K-nearest Neighbor Algorithm for Data Classification. In 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pages 294–298.

Opitz, D. and Maclin, R. (1999). Popular Ensemble Methods: An Empirical Study. Journal of Artificial Intelligence Research, volume 11, páginas 169-198(1).

Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, volume 1, páginas 81-106(1).

Salzberg, S. L. (1994). C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Machine Learning, volume 16, páginas 235-240(3).

Zander, S., Nguyen, T., and Armitage, G. (2005). Automated Traffic Classification and Application Identification Using Machine Learning. In The IEEE Conference on Local Computer Networks, 30th Anniversary., volume 1,páginas 250-257. IEEE.

Zuech, R. and Khoshgoftaar, T. (2015). A Survey on Feature Selection for Intrusion Detection. 21st ISSAT International Conference on Reliability and Quality in Design, páginas 150-155.

Seleção de Características por Clusterização para Melhorar a Detecção de Ataques de Rede

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)