Uma Abordagem Baseada em Big Data para Detecção de Intrusões com Aprendizado de Máquina sobre Dados de Múltiplos Domínios
Resumo
Neste trabalho, propomos um NIDS distribuído baseado em ensemble para melhorar a precisão e escalabilidade em redes de grande escala. Utilizando Apache Spark e Kafka, desacoplamos a ingestão de eventos da inferência, garantindo processamento em alta velocidade. O uso de múltiplos classificadores aumenta a generalização e reduz a perda de precisão em diferentes conjuntos de dados. Avaliações com os conjuntos UNSW-NB15, CS-CIC-IDS e BoT-IoT mostram que o modelo supera abordagens tradicionais, com ganhos de até 0,46 no F-Measure e processamento de 1,07 milhão de eventos por segundo.
Referências
Abreu, V., Santin, A. O., Viegas, E. K., and Stihler, M. (2017). A multi-domain role activation model. In 2017 IEEE International Conference on Communications (ICC), page 1–6. IEEE.
Akili, S., Purtzel, S., and Weidlich, M. (2024). Decopa: Query decomposition for parallel complex event processing. Proceedings of the ACM on Management of Data, 2(3):1–26.
Cantone, M., Marrocco, C., and Bria, A. (2024). Machine learning in network intrusion detection: A cross-dataset generalization study. IEEE Access, 12:144489–144508.
Espindola, A., Viegas, E. K., Traleski, A., Pellenz, M. E., and Santin, A. O. (2021). A deep autoencoder and rnn model for indoor localization with variable propagation loss. In 2021 17th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). IEEE.
Filho, A. G., Viegas, E. K., Santin, A. O., and Geremias, J. (2025). A dynamic network intrusion detection model for infrastructure as code deployed environments. Journal of Network and Systems Management, 33(4).
Hazman, C., Guezzaz, A., Benkirane, S., and Azrour, M. (2022). lids-sioel: intrusion detection framework for iot-based smart environments security using ensemble learning. Cluster Computing, 26(6):4069–4083.
Horchulhack, P., Viegas, E. K., Santin, A. O., and Simioni, J. A. (2024). Network-based intrusion detection through image-based cnn and transfer learning. In 2024 International Wireless Communications and Mobile Computing (IWCMC), page 386–391. IEEE.
Hussen, N., Elghamrawy, S. M., Salem, M., and El-Desouky, A. I. (2023). A fully streaming big data framework for cyber security based on optimized deep learning algorithm. IEEE Access, 11:65675–65688.
Jemili, F., Meddeb, R., and Korbaa, O. (2023). Intrusion detection based on ensemble learning for big data classification. Cluster Computing, 27(3):3771–3798.
Moustafa, N., Keshk, M., Choo, K.-K. R., Lynar, T., Camtepe, S., and Whitty, M. (2021). Dad: A distributed anomaly detection system using ensemble one-class statistical learning in edge networks. Future Generation Computer Systems, 118:240–251.
Moustafa, N. and Slay, J. (2015). Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS), page 1–6. IEEE.
Rodrigues, M. G., Viegas, E. K., Santin, A. O., and Enembreck, F. (2025). A mlops architecture for near real-time distributed stream learning operation deployment. Journal of Network and Computer Applications, 238:104169.
Sharafaldin, I., Habibi Lashkari, A., and Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy. SCITEPRESS - Science and Technology Publications.
Shrestha, S., Pathak, S., and Viegas, E. K. (2023). Towards a robust adversarial patch attack against unmanned aerial vehicles object detection. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 3256–3263. IEEE.
Simioni, J. A., Viegas, E. K., Santin, A. O., and de Matos, E. (2025). An energy-efficient intrusion detection offloading based on dnn for edge computing. IEEE Internet of Things Journal, 12(12):20326–20342.
Wang, M., Yang, N., Guo, Y., and Weng, N. (2024). Learn-ids: Bridging gaps between datasets and learning-based network intrusion detection. Electronics, 13(6):1072.
Ye, Z., Luo, J., Zhou, W., Wang, M., and He, Q. (2024). An ensemble framework with improved hybrid breeding optimization-based feature selection for intrusion detection. Future Generation Computer Systems, 151:124–136.
