Reprodutibilidade e Extensibilidade de Datasets de Rede: um estudo da replicação de traces de pacotes

Luciano B. Fiorino; Maxwell E. Monteiro; Cristina K. Dominicini; Gilmar L. Vassoler; João H. Corrêa; Rodolfo S. Villaça

doi:10.5753/sbesc_estendido.2021.18500

Luciano B. Fiorino IFES
Maxwell E. Monteiro IFES
Cristina K. Dominicini IFES
Gilmar L. Vassoler IFES
João H. Corrêa UFES
Rodolfo S. Villaça UFES

DOI: https://doi.org/10.5753/sbesc_estendido.2021.18500

Resumo

Em geral, a aplicação de algoritmos de aprendizado de máquina em problemas de redes utilizam datasets gerados a partir de traces de pacotes. Entretanto, o processo de geração dos datasets atuais não segue critérios que permitam identificar as informações necessárias para a reprodução e extensão dos mesmos. Dessa forma, este trabalho realiza um estudo detalhado sobre formas e ferramentas para reprodução dos tráfegos de rede dos datasets. Diante dos problemas de reprodutibilidade identificados, propomos uma metodologia para geração de datasets de traces de pacotes, de forma a minimizar esses problemas, possibilitando a reprodução dos seus tráfegos de rede e estendê-los com novos dados.

Palavras-chave: replicação de tráfego, traces de pacotes, computação em nuvem, datasets, aprendizado de máquina

Referências

R. Boutaba et al., “A comprehensive survey on machine learning for networking: evolution, applications and research opportunities,” Journal of Internet Services and Applications, vol. 9, no. 1, p. 16, 2018.

A. Shiravi et al., “Toward developing a systematic approach to generate benchmark datasets for intrusion detection,” Computers & Security, vol. 31, no. 3, pp. 357–374, 2012.

M. A. Ferrag et al., “Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study,” Journal of Information Security and Applications, vol. 50, p. 102419, 2020.

A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, 2019, ch. 1, pp. 24–31.

O. I. F. OpenStack, “Openstack - open source cloud computing platform software,” https://www.openstack.org/software/, acesso em: 25 fev. 2021.

I. Sharafaldin et al., “Developing realistic distributed denial of service (ddos) attack dataset and taxonomy,” in ICCST, 2019, pp. 1–8.

——, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” in ICISSP, 2018, pp. 108–116.

S. García et al., “An empirical comparison of botnet detection methods,” Computers & Security, vol. 45, pp. 100–123, 2014.

N. Moustafa and J. Slay, “Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),” in MilCIS, 2015, pp. 1–6.

F. Beer et al., “A new attack composition for network security,” in 10. DFN-Forum Kommunikationstechnologien, P. Müller, B. Neumair, H. Raiser, and G. Dreo Rodosek, Eds. Bonn: Gesellschaft für Informatik e.V., 2017, pp. 11–20.

I. University of California, “Kdd cup 1999 data,” http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 1999, acesso em: 20 fev. 2021.

M. Tavallaee et al., “A detailed analysis of the kdd cup 99 data set,” in 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009.

F. Beer and U. Bühler, “Feature selection for flow-based intrusion detection using rough set theory,” in 2017 IEEE 14th ICNSC, May 2017, pp. 617–624.

W.-c. Feng et al., “Tcpivo: A high-performance packet replay engine,” in Proceedings of the ACM SIGCOMM Workshop on Models, Methods and Tools for Reproducible Network Research. New York, NY, USA: ACM, 2003, p. 57–64.

Y. Li, R. Miao, M. Alizadeh, and M. Yu, “DETER: Deterministic TCP replay for performance diagnosis,” in 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). Boston, MA: USENIX Association, feb 2019, pp. 437–452.

AppNeta, “Tcpreplay - pcap editing and replaying utilities,” https://tcpreplay.appneta.com/, 2020, acesso em: 20 jan. 2021.

Stamus Networks, “Gophercap: Accurate, modular, scalable pcap manipulation tool written in go.” https://github.com/StamusNetworks/gophercap, 2020, acesso em: 08 fev. 2021.

P. Emmerich et al., “Moongen: A scriptable high-speed packet generator,” Proceedings of the 2015 Internet Measurement Conference, Oct 2015.

K. Wiles, “Pktgen - traffic generator powered by dpdk,” https://github.com/pktgen/Pktgen-DPDK, 2020, acesso em: 01 fev. 2021.

J. Ribas, “Dpdk burst replay tool,” https://github.com/FraudBuster/dpdk-burst-replay, 2019, acesso em: 01 fev. 2021.

Leonid Bugaev, “Goreplay,” https://goreplay.org/, 2020, acesso em: 26 mai. 2021.

Y.-D. Lin et al., “Low-storage capture and loss recovery selective replay of real flows,” IEEE Communications Magazine, vol. 50, no. 4, pp. 114–121, 2012.

S.-S. Hong and S. F. Wu, “On interactive internet traffic replay,” in International Workshop on Recent Advances in Intrusion Detection. Springer, 2005, pp. 247–264.

Cisco, “Cisco t-rex,” https://trex-tgn.cisco.com/, 2021, acesso em: 11 jun. 2021.

J. H. Corrêa et al., “Ml-based ddos detection and identification using native cloud telemetry macroscopic monitoring,” Journal of Network and Systems Management, vol. 29, no. 2, pp. 1–28, 2021.

T. C. B. Cloudflare, “Network-layer ddos attack trends for q4 2020,” https://blog.cloudflare.com/network-layer-ddos-attack-trends-for-q4-2020/, 2021, acesso em: 22 mar. 2021.