Uso do TF-IDF na Comparação de Dados para Detecção de Ransomware

  • Augusto Parisot UFF
  • Lucila M. S. Bento UERJ
  • Raphael C. S. Machado UFF

Resumo


Os ataques de ransomware representam uma das maiores ameaças cibernéticas enfrentadas por usuários e organizações em todo o mundo. Este artigo emprega a técnica TF-IDF, amplamente usada em processamento de linguagem natural, para processar dados de relatórios de análise dinâmica gerados pelo Cuckoo Sandbox. Comparamos diferentes tipos de dados, a fim de revelar quais podem ser usados com maior eficácia na detecção dessa ameaça. Para a avaliação, investigamos métodos de pré-processamento junto com algoritmos de aprendizado de máquina clássicos. Os resultados indicam que Random Forest e SVM, ao processarem dados de String com StandardScaler, alcançaram acurácia de até 98%, destacando-se como as abordagens mais eficazes.

Referências

Al-rimy, B. A. S., Maarof, M. A., and Shaid, S. Z. M. (2019). Crypto-ransomware early detection model using novel incremental bagging with enhanced semi-random subspace selection. Future Generation Computer Systems, 101:476–491.

Begovic, K., Al-Ali, A., and Malluhi, Q. (2023). Cryptographic ransomware encryption detection: Survey. Computers & Security, 132:103349.

Benmalek, M. (2024). Ransomware on cyber-physical systems: Taxonomies, case studies, security gaps, and open challenges. Internet of Things and Cyber-Physical Systems, 4:186–202.

Black, P., Sohail, A., Gondal, I., Kamruzzaman, J., Vamplew, P., and Watters, P. (2020). Api based discrimination of ransomware and benign cryptographic programs. In International Conference on Neural Information Processing, pages 177–188. Springer.

Cen, M., Jiang, F., Qin, X., Jiang, Q., and Doss, R. (2024). Ransomware early detection: A survey. Computer Networks, 239:110138.

Chang, K., Zhao, N., and Kou, L. (2022). A survey on malware detection based on api calls. In 2022 9th International Conference on Dependable Systems and Their Applications (DSA), pages 464–471.

Chen, Q., Islam, S. R., Haswell, H., and Bridges, R. A. (2019). Automated ransomware behavior analysis: Pattern extraction and early detection. In International Conference on Science of Cyber Security, pages 199–214. Springer.

Dabas, N., Ahlawat, P., and Sharma, P. (2023). An effective malware detection method using hybrid feature selection and machine learning algorithms. Arabian Journal for Science and Engineering, 48(8):9749 – 9767.

Dinh, P. V., Shone, N., Dung, P. H., Shi, Q., Hung, N. V., and Ngoc, T. N. (2019). Behaviour-aware malware classification: Dynamic feature selection. In 2019 11th International Conference on Knowledge and Systems Engineering, pages 1–5. IEEE.

Faceli, K., Lorena, A. C., Gama, J., and Carvalho, A. C. P. d. L. F. d. (2021). Inteligência artificial: uma abordagem de aprendizado de máquina. LTC.

Freeman, D. and Chio, C. (2018). Machine Learning and Security: Protecting Systems with Data and Algorithms. O’Reilly Media.

Guarnieri, C., Tanasi, A., Bremer, J., and Schloesser, M. (2012). The cuckoo sandbox. Accessed: Dec, 16:2018.

Horowitz, M. (2023). Check point 2023 security report.

IBMSecurity (2023a). Cost of a data breach report 2023.

IBMSecurity (2023b). X-force threat intelligence index 2023.

IBMSecurity (2024). X-force threat intelligence index 2024.

Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation.

Kaspersky (2021). Ransomware double extortion and beyond: Revil, clop, and conti.

Kaspersrky (2021). Ataques de ransomware direcionados crescem 700%.

Kim, M. and Kim, H. (2024). A dynamic analysis data preprocessing technique for malicious code detection with tf-idf and sliding windows. Electronics, 13(5).

Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2):159–165.

Maniriho, P., Mahmood, A. N., and Chowdhury, M. J. M. (2024a). A systematic literature review on windows malware detection: Techniques, research issues, and future directions. Journal of Systems and Software, 209:111921.

Maniriho, P., Mahmood, A. N., and Chowdhury, M. J. M. (2024b). A systematic literature review on windows malware detection: Techniques, research issues, and future directions. Journal of Systems and Software, 209:111921.

Mohanta, A. and Saldanha, A. (2020). Malware Analysis and Detection Engineering: A Comprehensive Approach to Detect and Analyze Modern Malware. Springer.

Prachi., Dabas, N., and Sharma, P. (2023). Malanalyser: An effective and efficient windows malware detection method based on api call sequences. Expert Systems with Applications, 230:120756.

Qin, B., Zhang, J., and Chen, H. (2021). Malware detection based on tf-(idf&icf) method. Journal of Physics: Conference Series, 2024(1):012030.

Razaulla, S., Fachkha, C., Markarian, C., Gawanmeh, A., Mansoor, W., Fung, B. C. M., and Assi, C. (2023). The age of ransomware: A survey on the evolution, taxonomy, and research directions. IEEE Access, 11:40698–40723.

Singh, J. and Singh, J. (2021). A survey on machine learning-based malware detection in executable files. Journal of Systems Architecture, 112:101861.

Statcounter (2024). Desktop windows version market share worldwide: May 2023 - may 2024.

Team, T. I. (2023). 2023 state of ransomware.

Vajjala, S., Majumder, B., Gupta, A., and Surana, H. (2020). Practical Natural Language Processing: A Comp. Guide to Building Real-world NLP Systems. O’Reilly Media.

Vang-Mata, R. (2020). Multilayer Perceptrons: Theory and Applications. Computer Science, Technology and Applications Series. Nova Science Publishers.

Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1):37–52. Proceedings of the Multivariate Statistical Workshop for Geologists and Geochemists.

Zhang, H., Xiao, X., Mercaldo, F., Ni, S., Martinelli, F., and Sangaiah, A. K. (2019). Classification of ransomware families with machine learning based on n-gram of opcodes. Future Generation Computer Systems, 90:211–221.

Zhang, S., Du, T., Shi, P., Su, X., and Han, Y. (2023). Early detection and defense countermeasure inference of ransomware based on api sequence. International Journal of Advanced Computer Science and Applications, 14(10):632 – 641.
Publicado
16/09/2024
PARISOT, Augusto; BENTO, Lucila M. S.; MACHADO, Raphael C. S.. Uso do TF-IDF na Comparação de Dados para Detecção de Ransomware. In: SIMPÓSIO BRASILEIRO DE SEGURANÇA DA INFORMAÇÃO E DE SISTEMAS COMPUTACIONAIS (SBSEG), 24. , 2024, São José dos Campos/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 678-693. DOI: https://doi.org/10.5753/sbseg.2024.240700.