Detecção de Ataques de Phishing por Meio de Modelos de Linguagem

Pedro M. M. Souza; João V. S. Santos; Antonio M. B. Neto; Francisco V. J. Nobre; Alex F. R. Trajano; Rafael L. Gomes

doi:10.5753/sbrc.2026.19297

Pedro M. M. Souza UECE
João V. S. Santos UECE
Antonio M. B. Neto UECE
Francisco V. J. Nobre UECE
Alex F. R. Trajano Instituto Atlântico
Rafael L. Gomes UECE

DOI: https://doi.org/10.5753/sbrc.2026.19297

Resumo

A sofisticação dos ataques de phishing, impulsionada por IA generativa, limita a eficácia de métodos tradicionais baseados em regras. Este trabalho apresenta o VerificAI, um sistema híbrido para detecção em tempo real de phishing em e-mails e SMS, integrando LLMs, SLMs, RAG, validação de URLs e Aprendizado Ativo. O sistema permite o envio de mensagens suspeitas a um chatbot com resposta automatizada e explicável. Experimentos com conjuntos de dados reais comparam modelos em nuvem e locais. Os resultados mostram que a versão VerificAI com LLMs em nuvem atinge até 97% de F1-score, enquanto a versão do VerificAI com SLMs locais oferece desempenho competitivo com menor latência e maior privacidade.

Referências

Afonso, P., Maia, E., Amorim, I., and Praça, I. (2025). Rethinking phishing detection: How dataset quality affects model generalization. In 2025 15th International Conference on Advanced Computer Information Technologies (ACIT), pages 542–547.

Brito, M. L. L., Ferreira, M. C. M., Portela, A. L. C., and Gomes, R. L. (2026). Ai-based estimation of bandwidth availability for data offloading in edge-cloud computing. IEEE Networking Letters, 8:69–73.

Costa, M. A., Costa, Y. M., Almeida, Y. O., Cardoso, F. J., and Gomes, R. L. (2024). Connection management using automated firewall based on threat intelligence. In Proceedings of the 2024 Latin America Networking Conference, LANC ’24, page 32–37, New York, NY, USA. Association for Computing Machinery.

Hasan, N., BusiReddyGari, P., Zhao, H., Ren, Y., Xu, J., and Zhang, S. (2025). Phishing email detection using large language models.

Koide, T., Fukushi, N., Nakano, H., and Chiba, D. (2024). Chatspamdetector: Leveraging large language models for effective phishing email detection.

Li, Z., Chen, W., and Zhang, H. (2025). Fedphishllm: A privacy-preserving and explainable phishing detection mechanism using federated learning and large language models. Journal of Cybersecurity and Privacy, 5(2):123–140.

Mahendru, S. and Pandit, T. (2024). Securenet: A comparative study of deberta and large language models for phishing detection. In 2024 IEEE 7th International Conference on Big Data and Artificial Intelligence (BDAI), page 160–169. IEEE.

Mendes, P., Maia, E., and Praça, I. (2025). Meajor corpus: A multi-source dataset for phishing email detection.

Pimenta, I., Silva, D., Moura, E., Silveira, M., and Gomes, R. L. (2024). Impact of data anonymization in machine learning models. In Proceedings of the 13th Latin-American Symposium on Dependable and Secure Computing, pages 188–191.

Pimenta, I. A., Lee, M. H., Bittencourt, L. F., and Gomes, R. L. (2025). Adaptive privacy based on mutual information for machine learning in edge-cloud environments. IEEE Networking Letters, pages 1–1.

Schmitt, M. and Flechais, I. (2024). Digital deception: generative artificial intelligence in social engineering and phishing. Artificial Intelligence Review, 57(12).

Souza, M. S., Ribeiro, S. E. S. B., Lima, V. C., Cardoso, F. J., and Gomes, R. L. (2024). Combining regular expressions and machine learning for sql injection detection in urban computing. Journal of Internet Services and Applications, 15(1):103–111.

Wang, Y., Tian, C., Hu, B., Yu, Y., Liu, Z., Zhang, Z., Zhou, J., Pang, L., and Wang, X. (2024). Can small language models be good reasoners for sequential recommendation? In Proceedings of the ACM Web Conference 2024, WWW ’24, page 3876–3887, New York, NY, USA. Association for Computing Machinery.

Wang, Y., Zhai, H., Wang, C., Hao, Q., Cohen, N. A., Foulger, R., Handler, J. A., and Wang, G. (2025). Can you walk me through it? explainable sms phishing detection using llm-based agents. In Proceedings of the Twenty-First USENIX Conference on Usable Privacy and Security, SOUPS ’25, USA. USENIX Association.

Xiao, S., Liu, Z., Zhang, P., Muennighoff, N., Lian, D., and Nie, J.-Y. (2024). C-pack: Packed resources for general chinese embeddings.

Xu, A., Yu, T., Du, M., Gundecha, P., Guo, Y., Zhu, X., Wang, M., Li, P., and Chen, X. (2024). Generative ai and retrieval-augmented generation (rag) systems for enterprise. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, CIKM ’24, page 5599–5602, New York, NY, USA. Association for Computing Machinery.

Detecção de Ataques de Phishing por Meio de Modelos de Linguagem

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)