A Big Data-Based Approach for Intrusion Detection with Machine Learning on Multi-Domain Data

  • Vinicius M. S. de Oliveira PUCPR
  • Henrique M. S. de Oliveira PUCPR
  • Gabriel M. Santos PUCPR
  • Jhonatan Geremias PUCPR
  • Eduardo K. Viegas PUCPR

Abstract


Neste trabalho, propomos um NIDS distribuído baseado em ensemble para melhorar a precisão e escalabilidade em redes de grande escala. Utilizando Apache Spark e Kafka, desacoplamos a ingestão de eventos da inferência, garantindo processamento em alta velocidade. O uso de múltiplos classificadores aumenta a generalização e reduz a perda de precisão em diferentes conjuntos de dados. Avaliações com os conjuntos UNSW-NB15, CS-CIC-IDS e BoT-IoT mostram que o modelo supera abordagens tradicionais, com ganhos de até 0,46 no F-Measure e processamento de 1,07 milhão de eventos por segundo.

References

Abid, A., Jemili, F., and Korbaa, O. (2023). Real-time data fusion for intrusion detection in industrial control systems based on cloud computing and big data techniques. Cluster Computing, 27(2):2217–2238.

Abreu, V., Santin, A. O., Viegas, E. K., and Stihler, M. (2017). A multi-domain role activation model. In 2017 IEEE International Conference on Communications (ICC), page 1–6. IEEE.

Akili, S., Purtzel, S., and Weidlich, M. (2024). Decopa: Query decomposition for parallel complex event processing. Proceedings of the ACM on Management of Data, 2(3):1–26.

Cantone, M., Marrocco, C., and Bria, A. (2024). Machine learning in network intrusion detection: A cross-dataset generalization study. IEEE Access, 12:144489–144508.

Espindola, A., Viegas, E. K., Traleski, A., Pellenz, M. E., and Santin, A. O. (2021). A deep autoencoder and rnn model for indoor localization with variable propagation loss. In 2021 17th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). IEEE.

Filho, A. G., Viegas, E. K., Santin, A. O., and Geremias, J. (2025). A dynamic network intrusion detection model for infrastructure as code deployed environments. Journal of Network and Systems Management, 33(4).

Hazman, C., Guezzaz, A., Benkirane, S., and Azrour, M. (2022). lids-sioel: intrusion detection framework for iot-based smart environments security using ensemble learning. Cluster Computing, 26(6):4069–4083.

Horchulhack, P., Viegas, E. K., Santin, A. O., and Simioni, J. A. (2024). Network-based intrusion detection through image-based cnn and transfer learning. In 2024 International Wireless Communications and Mobile Computing (IWCMC), page 386–391. IEEE.

Hussen, N., Elghamrawy, S. M., Salem, M., and El-Desouky, A. I. (2023). A fully streaming big data framework for cyber security based on optimized deep learning algorithm. IEEE Access, 11:65675–65688.

Jemili, F., Meddeb, R., and Korbaa, O. (2023). Intrusion detection based on ensemble learning for big data classification. Cluster Computing, 27(3):3771–3798.

Moustafa, N., Keshk, M., Choo, K.-K. R., Lynar, T., Camtepe, S., and Whitty, M. (2021). Dad: A distributed anomaly detection system using ensemble one-class statistical learning in edge networks. Future Generation Computer Systems, 118:240–251.

Moustafa, N. and Slay, J. (2015). Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS), page 1–6. IEEE.

Rodrigues, M. G., Viegas, E. K., Santin, A. O., and Enembreck, F. (2025). A mlops architecture for near real-time distributed stream learning operation deployment. Journal of Network and Computer Applications, 238:104169.

Sharafaldin, I., Habibi Lashkari, A., and Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy. SCITEPRESS - Science and Technology Publications.

Shrestha, S., Pathak, S., and Viegas, E. K. (2023). Towards a robust adversarial patch attack against unmanned aerial vehicles object detection. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 3256–3263. IEEE.

Simioni, J. A., Viegas, E. K., Santin, A. O., and de Matos, E. (2025). An energy-efficient intrusion detection offloading based on dnn for edge computing. IEEE Internet of Things Journal, 12(12):20326–20342.

Wang, M., Yang, N., Guo, Y., and Weng, N. (2024). Learn-ids: Bridging gaps between datasets and learning-based network intrusion detection. Electronics, 13(6):1072.

Ye, Z., Luo, J., Zhou, W., Wang, M., and He, Q. (2024). An ensemble framework with improved hybrid breeding optimization-based feature selection for intrusion detection. Future Generation Computer Systems, 151:124–136.
Published
2025-09-01
OLIVEIRA, Vinicius M. S. de; OLIVEIRA, Henrique M. S. de; SANTOS, Gabriel M.; GEREMIAS, Jhonatan; VIEGAS, Eduardo K.. A Big Data-Based Approach for Intrusion Detection with Machine Learning on Multi-Domain Data. In: WORKSHOP ON SCIENTIFIC INITIATION AND UNDERGRADUATE WORKS - BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 25. , 2025, Foz do Iguaçu/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 283-292. DOI: https://doi.org/10.5753/sbseg_estendido.2025.10831.

Most read articles by the same author(s)

1 2 > >>