DNApp: Desafios de Manutenção da Integridade em Hosts via Caracterização de Aplicações baseada em Instruções

Felipe Duarte Silva; Marco Zanata Alves; Paulo Lisboa de Almeida; André Grégio

doi:10.5753/sbseg.2025.11435

Felipe Duarte Silva UFPR
Marco Zanata Alves UFPR
Paulo Lisboa de Almeida UFPR
André Grégio UFPR

DOI: https://doi.org/10.5753/sbseg.2025.11435

Resumo

Este artigo propõe o DNApp, método para identificar executáveis privilegiados em Linux por meio da análise sintática de instruções assembly, utilizando n-gramas de opcode (bi, tri e 4-gramas) vetorizados com TF-IDF em binários de cinco versões do Ubuntu. Os vetores médios são avaliados com k-means e coeficiente de Silhouette, mostrando que 4-gramas com 128–256 dimensões separam melhor amostras pequenas, enquanto bigramas com 512 dimensões funcionam melhor em conjuntos maiores. Agrupamentos funcionais são formados de modo consistente, embora existam limitações como sobreposição e vetores restritos. O método pode ajudar a identificar modificações maliciosas em binários privilegiados, independentemente de assinaturas estáticas.

Referências

Alan Lacerda (2021). O formato elf (executable and linking format). [link]. Acesso em: maio 2025.

Arthur, D. and Vassilvitskii, S. (2006). k-means++: The advantages of careful seeding. Technical report, Stanford.

Bureau, P.-M., Étienne M. Léveillé, M., and Bilodeau, O. (2014). Operation windigo: The vivisection of a large linux server-side credential stealing malware campaign. Technical report, ESET.

Canonical Ltd. (2025). Ubuntu release notes. [link].

Chen, W., Zhao, S., and Zhang, L. (2021). Static malware detection via opcode unigram frequency and $k$-nn. Security and Communication Networks, 2021:1–13.

Debian Project (2024). Debian policy manual: Checksums in /var/lib/dpkg/status. [link].

Docker Documentation (2025). Docker Engine Reference. Docker, Inc. Versão 26.1.3. Disponível em: [link].

Edge, J. (2024). Backdoor discovered in xz utils compression library. [link]. Publicado em: 29 mar. 2024. Acesso em: maio 2025.

Free Software Foundation (2024). GNU Binutils Manual. GNU Project. Disponível em: [link].

Gray, J., Sgandurra, D., Cavallaro, L., and Alis, J. B. (2024). Identifying authorship in malicious binaries: Features, challenges & datasets. ACM Computing Surveys.

Greenberg, A. (2023). The huge 3cx breach was actually 2 linked supply chain attacks. WIRED.

Han, J., Kamber, M., and Pei, J. (2011). Data Mining: Concepts and Techniques. Morgan Kaufmann, Waltham, MA, 3rd edition.

Jalilian, A., Narimani, Z., and Ansari, E. (2020). Static signature-based malware detection using opcode and binary information. In Data Science: From Research to Application, volume 45 of Lecture Notes on Data Engineering and Communications Technologies, pages 24–35. Springer.

Kaufman, L. and Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Linux Foundation (2015). Filesystem hierarchy standard, version 3.0. [link]. Seções 4.13 (/usr/bin) e 4.14 (/usr/sbin). Acesso em: maio 2025.

Romanov, A., Kurtukova, A., Fedotova, A., and Shelupanov, A. (2023). Authorship identification of binary and disassembled codes using nlp methods. Information, 14(7):361.

Saini, V., Gupta, R., and Soni, N. (2025). Opcode-based malware classification using machine learning and deep learning techniques. arXiv preprint arXiv:2504.13408.

Salton, G. and Yang, C.-S. (1973). On the specification of term values in automatic indexing. Journal of documentation, 29(4):351–372.

Santos, I., Brezo, F., Ugarte-Pedrero, X., and Bringas, P. G. (2013). Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences, 231:64–82.

van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9:2579–2605.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

W3Techs (2025). Usage statistics of linux for websites. [link].

Zhang, B., Xiao, W., Xiao, X., Sangaiah, A. K., Zhang, W., and Zhang, J. (2020). Ransomware classification using patch-based cnn and self-attention network on embedded n-grams of opcodes. Future Generation Computer Systems, 110:708–720.

DNApp: Desafios de Manutenção da Integridade em Hosts via Caracterização de Aplicações baseada em Instruções

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)