Analysis of the impact of IP flows aggregation on supervised machine learning algorithms for intrusion detection
Abstract
Machine learning has been used in cybersecurity to address the limitations of pattern identification techniques in network traffic. The existence of numerous algorithms in the literature makes the choice of which one is most suitable for the intrusion detection, not be a trivial task. In this paper is performed a comparative analysis of 6 supervised machine learning algorithms evaluating the impact of the aggregation of the IP flows in the predictions, training time and test. The experiments showed that the aggregation method improves the classification and reduces the processing time of the models. In the analysis performed, the Decision Tree obtained the best balance in the results.
References
AltexSoft (2018). Machine Learning: Bridging Between Business and Data Science. https://www.altexsoft.com/whitepapers/machine-learning-bridging-betweenbusiness-and-data-science/, accessed on November.
Amaral, A. A., Mendes, L. de S., Zarpelão, B. B. and Junior, M. L. P. (2017). Deep IP flow inspection to detect beyond network anomalies. Computer Communications, v. 98, p. 80–96.
Belouch, M., El Hadaj, S. and Idhammad, M. (2018). Performance evaluation of intrusion detection based on machine learning using Apache Spark. Procedia Computer Science, v. 127, p. 1–6.
Brownlee, J. (2017). What is the Difference Between Test and Validation Datasets? Machine Learning Mastery. https://machinelearningmastery.com/difference-testvalidation- datasets/, accessed on Nov.
Buczak, A. L. and Guven, E. (2016). A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection. IEEE Communications Surveys & Tutorials, v. 18, n. 2, p. 1153–1176.
Cisco (2018). Cisco 2018 Annual Cybersecurity Report. https://www.cisco.com/c/en/us/products/security/security-reports.html, accessed on November.
Das, S. and Nene, M. J. (2017). A survey on types of machine learning techniques in intrusion prevention systems. In 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE. http://ieeexplore.ieee.org/document/8300169/, accessed on November.
Hamid, Y., Sugumaran, M. and Journaux, L. (2016). Machine Learning Techniques for Intrusion Detection: A Comparative Analysis. In Proceedings of the International Conference on Informatics and Analytics – ICIA-16. ACM Press. http://dl.a cm.org/citation.cfm?doid=2980258.2980378 , accessed on October.
IETF (2018). IP Flow Information Export (IPFIX). http://datatracker.ietf.org/wg/ipfix/charter/, accessed on Nov.
Lobato, A. G. P., Lopez, M. A. and Duarte, O. C. M. B. (2016). An Accurate Threat Detection System through Real-Time Stream Processing. Grupo de Teleinformática e Automação (GTA), Universidade Federal do Rio de Janeiro (UFRJ), Tech. Rep.
Kakihata, E. M., Sapia, H. M., Oiakawa, R. T., et al. (2017). Intrusion Detection System Based On Flows Using Machine Learning Algorithms. IEEE Latin America Transactions, v. 15, n. 10, p. 1988–1993.
Moro, F. L., Amaral, A. A., Amaral, A. P. M. and Nogueira, R. R. (2018). Detecção e mitigação de um ataque DoS em seu estágio inicial em uma rede definida por software. In IX Congresso Sul Brasileiro de Computação (SULCOMP).
Najafabadi, M. M., Khoshgoftaar, T. M., Calvert, C. and Kemp, C. (2015). Detection of SSH Brute Force Attacks Using Aggregated Netflow Data. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). IEEE. http://ieeexplore.ieee.org/document/7424322/ , accessed on October.
Scikit-learn (2018). Supervised learning – scikit-learn 0.20.0 documentation. https://scikit-learn.org/stable/supervised_learning.html, accessed on November.
