Caracterização e Classificação do Tráfego da Darknet com Modelos Baseados em Árvores de Decisão
Abstract
Darknet is a set of networks and technologies, having as fundamental principles anonymity and security. In many cases, they are associated with illicit activities, opening space for malware traffic and attacks to legitimate services. To prevent Darknet misuse is necessary to classify and characterize its existing traffic. In this paper, we characterize and classify the real Darknet traffic available from the CIC-Darknet2020 dataset. Therefore, we performed the feature extraction and grouped the possible subnets with an n-gram approach. Furthermore, we evaluated the relevance of the best features selected by the Recursive Feature Elimination method for the problem. Our results indicate that simple models, like Decision Trees and Random Forests, reach an accuracy above 99% on traffic classification, representing a gain up to 13% in comparison with the state-of-the-art.
References
Draper-Gil, G., Lashkari, A. H., Mamun, M. S. I., and Ghorbani, A. A. (2016). Characterization of encrypted and vpn traffic using time-related. In Proc. of the Int. conference on information systems security and privacy (ICISSP), pages 407–414.
Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.
Gurdip Kaur, Arash Habibi Lashkari, A. R. (2020). aDIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning. In 10th International Conference on Communication and Network Security (ICCNS 2020).
Lashkari, A. H., Draper-Gil, G., Mamun, M. S. I., and Ghorbani, A. A. (2017). Characterization of tor traffic using time based features. In Proc. of the Int. conference on information systems security and privacy (ICISSP), pages 253–262.
Lotfollahi, M., Siavoshani, M. J., Zade, R. S. H., and Saberian, M. (2020). Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Computing, 24(3):1999–2012.
Medeiros, D., Cunha Neto, H., Andreoni Lopez, M., Magalhaes, L., Silva, E., Vieira, A., Fernandes, N., and Mattos, D. (2019). Análise de dados em redes sem fio de grande porte: Processamento em fluxo em tempo real, tendências e desafios. Minicursos do Simpósio Brasileiro de Redes de Computadores-SBRC, 2019:142–195.
Mirea, M., Wang, V., and Jung, J. (2019). The not so dark side of the darknet: a qualitative study. Security Journal, 32(2):102–118.
Mogul, J. et al. (1985). Internet standard subnetting procedure.
Parchekani, A., Naghadeh, S. N., and Shah-Mansouri, V. (2020). Classification of traffic using neural networks by rejecting: a novel approach in classifying vpn traffic. arXiv preprint arXiv:2001.03665.
Villela, S. M., Xavier, A. E., and Neto, R. F. (2011). Seleção de características com busca ordenada e classificadores de larga margem. Universidade Federal do Rio de Janeiro, COPPE, Programa de Engenharia de Sistemas e Computação.
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., and Attenberg, J. (2009). Feature hashing for large scale multitask learning. In Proc. of the 26th annual international conference on machine learning, pages 1113–1120.
Wressnegger, C., Schwenk, G., Arp, D., and Rieck, K. (2013). A close look on n-grams in intrusion detection: anomaly detection vs. classification. In Proc. of the 2013 ACM workshop on Artificial Intelligence and Security, pages 67–76.
Zheng, A. and Casari, A. (2018). Feature engineering for machine learning: principles and techniques for data scientists. ”O’Reilly Media, Inc.”.
