Floresta de Decisão Distribuída: Um Sistema de Aprendizado de Máquina Colaborativo Par-a-Par para Detecção de Intrusão em Redes

  • Lucas Fauster Leite Pereira UFF
  • Igor Monteiro Moraes UFF
  • Diogo Menezes Ferrazani Mattos UFF

Abstract


Distributed machine learning is a solution for collaboratively training models of Intrusion Detection Systems, in which each participant shares only the locally trained model, keeping local data on their devices. This work proposes a Machine Learning System for Distributed Intrusion Detection based on a point-to-point communication topology. The key idea is sharing a Decision Tree model, in which the shared trees make up a Distributed Decision Forest. The work simulates and compares the proposal against a Federated Intrusion Detection System with parameter server communication topology, which deploys a neural network. The simulations show that the Distributed Decision Forest model has a median accuracy of 79% with only one aggregation round. The neural network model reached a median accuracy of 86% but after ten aggregation rounds. The result shows that the Distributed Decision Forest model imposes less processing overhead and greater data privacy to achieve performance comparable to the federated neural network.

References

Alazab, M., RM, S. P., Parimala, M., Maddikunta, P. K. R., Gadekallu, T. R. e Pham, Q.-V. (2021). Federated learning for cybersecurity: Concepts, challenges, and future directions. IEEE Transactions on Industrial Informatics, 18(5):3501-3509.

Aragão, M. V. C., Mafra, S. B. e de Figueiredo, F. A. P. (2022). Análise de tráfego de rede com machine learning para identificação de ameaças a dispositivos IoT. Em XL Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2022), Rio de Janeiro, RJ. SBrT.

Bellet, A., Guerraoui, R., Taziki, M. e Tommasi, M. (2018). Personalized and private peer-to-peer machine learning. Em Storkey, A. e Perez-Cruz, F., editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, p. 473-481. PMLR.

Chamikara, M., Bertok, P., Khalil, I., Liu, D. e Camtepe, S. (2021). Privacy preserving distributed machine learning with federated learning. Computer Communications, 171:112-125.

Costa, L. H. M. K., de Amorim, M. D., Campista, M. E. M., Rubinstein, M. G., Florissi, P. e Duarte, O. C. M. B. (2012). Grandes massas de dados na nuvem: Desafios e técnicas para inovação. Em Minicursos do SBRC 2012, capítulo 1, p. 1-58. SBC, Porto Alegre.

Hard, A., Rao, K., Mathews, R., Ramaswamy, S., Beaufays, F., Augenstein, S., Eichner, H., Kiddon, C. e Ramage, D. (2018). Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604.

Khan, M. A., Karim, M. R. e Kim, Y. (2019). A scalable and hybrid intrusion detection system based on the convolutional-lstm network. Symmetry, 11(4).

Lim, W. Y. B., Luong, N. C., Hoang, D. T., Jiao, Y., Liang, Y.-C., Yang, Q., Niyato, D. e Miao, C. (2020). Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials, 22(3):2031-2063.

Liu, C., Gu, Z. e Wang, J. (2021). A hybrid intrusion detection system based on scalable k-means+ random forest and deep learning. IEEE Access, 9:75729-75740.

Liu, H. e Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences, 9(20):4396.

Liu, Y., Yu, J. J. Q., Kang, J., Niyato, D. e Zhang, S. (2020). Privacy-preserving traffic flow prediction: A federated learning approach. IEEE Internet of Things Journal, 7(8):7751-7763.

Lopez, M. A., Silva, R. S., Alvarenga, I. D., Mattos, D. M. F. e Duarte, O. C. M. B. (2017). Coleta e caracterização de um conjunto de dados de tráfego real de redes de acesso em banda larga. Em Anais do XXII Workshop de Gerência e Operação de Redes e Serviços, Porto Alegre, RS, Brasil. SBC.

McMahan, B., Moore, E., Ramage, D., Hampson, S. e y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Em Artificial Intelligence and Statistics, p. 1273-1282. PMLR.

Medeiros, D. S. V., Cunha Neto, H. N., Lopez, M. A., S. Magalhães, L. C., Fernandes, N. C., Vieira, A. B., Silva, E. F. e F. Mattos, D. M. (2020). A survey on data analysis on large-scale wireless networks: online stream processing, trends, and challenges. Journal of Internet Services and Applications, 11(1).

Mothukuri, V., Parizi, R. M., Pouriyeh, S., Huang, Y., Dehghantanha, A. e Srivastava, G. (2021). A survey on security and privacy of federated learning. Future Generation Computer Systems, 115:619-640.

Neto, H. N. C., Dusparic, I., Mattos, D. M. F. e Fernande, N. C. (2022). FedSA: Accelerating intrusion detection in collaborative environments with federated simulated annealing. Em 2022 IEEE 8th International Conference on Network Softwarization (NetSoft), p. 420-428.

Neto, H. N. C., Mattos, D. M. F. e Fernandes, N. C. (2020). Privacidade do usuário em aprendizado colaborativo: Federated learning, da teoria à prática. Em Minicursos do SBRC 2012, capítulo 3, p. 101-155. Sociedade Brasileira de Computação, Porto Alegre, RS.

Sanz, I., Lopez, M. A., Rebello, G. A. e Duarte, O. C. (2018). Um sistema de detecção de ameaças distribuídas de rede baseado em aprendizagem por grafos. Em Anais do XXXVI Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, p. 1187-1200, Porto Alegre, RS, Brasil. SBC.

Sapio, A., Canini, M., Ho, C.-Y., Nelson, J., Kalnis, P., Kim, C., Krishnamurthy, A., Moshref, M., Ports, D. R. K. e Richtárik, P. (2021). Scaling Distributed Machine Learning with In-Network Aggregation. Em Proceedings of NSDI'21.

Shi, S., Wang, Q. e Chu, X. (2018). Performance modeling and evaluation of distributed deep learning frameworks on GPUs. Em 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), p. 949-957.

Souza, L., Rebello, G., Camilo, G., Guimarães, L. e Duarte, O. (2020). DFedForest: Floresta federada descentralizada. Em Anais do XX Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, p. 355-368, Porto Alegre, Brasil. SBC.

Tang, F., Mao, B., Fadlullah, Z. M. e Kato, N. (2018). On a novel deep-learning-based intelligent partially overlapping channel assignment in SDN-IoT. IEEE Communications Magazine, 56(9):80-86.

Truong, N., Sun, K., Wang, S., Guitton, F. e Guo, Y. (2021). Privacy preservation in federated learning: An insightful survey from the GDPR perspective. Computers & Security, 110:102402.

Tuler De Oliveira, M., Reis, L. H. A., Verginadis, Y., Mattos, D. M. F. e Olabarriaga, S. D. (2022). Smartaccess: Attribute-based access control system for medical records based on smart contracts. IEEE Access, 10:117836-117854.
Published
2023-05-22
PEREIRA, Lucas Fauster Leite; MORAES, Igor Monteiro; MATTOS, Diogo Menezes Ferrazani. Floresta de Decisão Distribuída: Um Sistema de Aprendizado de Máquina Colaborativo Par-a-Par para Detecção de Intrusão em Redes. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 41. , 2023, Brasília/DF. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 253-266. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2023.469.

Most read articles by the same author(s)