Detecção Eficiente de Anomalias em Hosts usando Transformer com Categorização de Chamadas de Sistema

Diogo Bortolini; Rafael R. Obelheiro; Carlos A. Maziero

doi:10.5753/sbseg.2025.10695

Diogo Bortolini UFPR
Rafael R. Obelheiro UDESC
Carlos A. Maziero UFPR

DOI: https://doi.org/10.5753/sbseg.2025.10695

Resumo

Este trabalho propõe uma abordagem eficiente para detecção de anomalias em hosts, baseada na análise de sequências de chamadas de sistema (syscalls). As syscalls são agrupadas por nível de ameaça e funcionalidade, gerando representações compactas analisadas por um modelo Transformer sobre matrizes de frequência de bigramas. Quatro representações foram avaliadas, e a combinação de ameaça e funcionalidade obteve F1-score de 0,90, próxima ao modelo original (0,92), com redução superior a 86% no tempo de execução, mantendo desempenho competitivo com menor custo computacional.

Referências

Alshomrani, M., Albeshri, A., Alturki, B., Alallah, F. S., and Alsulami, A. A. (2024). Survey of transformer-based malicious software detection systems. Electronics, 13(23).

Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade: 2nd ed, pages 437–478. Springer.

Bernaschi, M., Gabrielli, E., and Mancini, L. V. (2002). REMUS: A security-enhanced operating system. ACM Transactions on Information and System Security, 5(1):36–61.

Beschastnikh, I., Liu, P., Xing, A., Wang, P., Brun, Y., and Ernst, M. D. (2020). Visualizing distributed system executions. ACM Transactions on Soft. Eng. and Methodology, 29(2).

Bhattacharyya, S., Snasel, V., Hassanien, A., Saha, S., and Tripathy, B. (2020). Deep Learning: Research and Applications. Frontiers in Comput. Intelligence. De Gruyter.

Bridges, R., Glass-Vanderlan, T., Iannacone, M., Vincent, ., and Chen, Q. (2019). A survey of intrusion detection systems leveraging host data. ACM Computing Surveys, 52(6).

Creech, G. and Hu, J. (2013). Generation of a new IDS test dataset: Time to retire the KDD collection. In IEEE Wireless Commun. and Netw. Conf. (WCNC), pages 4487–4492.

Das, B. (2020). VFS over the years: An efficient change log and system call for kernel developers. International Journal For Multidisciplinary Research, 2(6):185–200.

Deep Learning Academy (2022). Deep Learning Book. [s.n.]. Accessed on May 03, 2025.

Forrest, S., Hofmeyr, S., Somayaji, A., and Longstaff, T. (1996). A sense of self for Unix processes. In IEEE Symposium on Security and Privacy, pages 120–128.

Fournier, Q., Aloise, D., and Costa, L. R. (2023). Language models for novelty detection in system call traces. arXiv preprint arXiv:2309.02206.

Guan, Y. and Ezzati-Jivan, N. (2021). Malware system calls detection using hybrid system. In 2021 IEEE International Systems Conference (SysCon), pages 1–8.

Halpern, J. (1987). Using reasoning about knowledge to analyze distributed systems. Annual review of computer science, 2(1):37–68.

Heinrich, T., Will, N. C., Obelheiro, R. R., and Maziero, C. A. (2024). A categorical data approach for anomaly detection in WebAssembly applications. In Intl Conference on Information Systems Security and Privacy (ICISSP), pages 275–284.

Hubballi, N. (2012). Pairgram: Modeling frequency information of lookahead pairs for system call based anomaly detection. In 4th Intl Conference on Communication Systems and Networks (COMSNETS), pages 1–10.

Islam, S., Elmekki, H., Elsebai, A., Bentahar, J., Drawel, N., Rjoub, G., and Pedrycz, W. (2024). A comprehensive survey on applications of transformers for deep learning tasks. Expert Systems with Applications, 241:122666.

Khandelwal, P., Likhar, P., and Yadav, R. S. (2022). Machine learning methods leveraging ADFA-LD dataset for anomaly detection in Linux host systems. In 2nd Intl Conference on Intelligent Technologies (CONIT), pages 1–8.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436–444. Nature Publishing Group UK London.

Liu, M., Xue, Z., Xu, X., Zhong, C., and Chen, J. (2018). Host-based intrusion detection system with system calls: Review and future trends. ACM Computing Surveys, 51(5).

Love, R. (2010). Linux Kernel Development. Developer’s Library. Pearson Education.

Ma, M., Han, L., and Zhou, C. (2024). Research and application of transformer based anomaly detection model: A literature review. arXiv preprint arXiv:2402.08975.

Osamor, F. and Wellman, B. (2022). Deep learning-based hybrid model for efficient anomaly detection. Intl Journal of Advanced Computer Science and Applications.

Ott, H., Bogatinovski, J., Acker, A., Nedelkoski, S., and Kao, O. (2021). Robust and transferable anomaly detection in log data using pre-trained language models. In IEEE/ACM Intl Workshop on Cloud Intelligence, pages 19–24.

Prasse, P., Brabec, J., Kohout, J., Kopp, M., Bajer, L., and Scheffer, T. (2021). Learning explainable representations of malware behavior. In European Conference on Machine Learning (ECML PKDD), pages 53–68. Springer.

Ring, J. H., Van Oort, C. M., Durst, S., White, V., Near, J. P., and Skalka, C. (2021). Methods for host-based intrusion detection with deep learning. Digital Threats, 2(4).

Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):210–229.

Shin, S., Lee, I., and Choi, C. (2019). Anomaly dataset augmentation using the sequence generative models. In IEEE Intl Conf. On ML And App. (ICMLA), pages 1143–1148.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Vyšniūnas, T., Čeponis, D., Goranin, N., and Čenys, A. (2024). Risk-based system-call sequence grouping method for malware intrusion detection. Electronics, 13(1).

Wang, C., Li, Z., Mo, X., Yang, H., and Zhao, Y. (2017). An Android malware dynamic detection method based on service call co-occurrence matrices. Annals of Telecommunications, 72:607–615.

Zhang, X., Niyaz, Q., Jahan, F., and Sun, W. (2020). Early detection of host-based intrusions in Linux environment. In IEEE Intl Conference EIT, pages 475–479.

Zhong, C., Yu, Q., Luo, H., and Xie, S. (2023). A malicious programs detection method incorporating transformer and co-occurrence matrix. In 3rd Intl Conference AASIP. International Society for Optics and Photonics, SPIE.

Detecção Eficiente de Anomalias em Hosts usando Transformer com Categorização de Chamadas de Sistema

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)