Detecção de Ataques de Phishing por Meio de Modelos de Linguagem

Pedro M. M. Souza; João V. S. Santos; Antonio M. B. Neto; Francisco V. J. Nobre; Alex F. R. Trajano; Rafael L. Gomes

doi:10.5753/sbrc.2026.19297

Pedro M. M. Souza UECE
João V. S. Santos UECE
Antonio M. B. Neto UECE
Francisco V. J. Nobre UECE
Alex F. R. Trajano Instituto Atlântico
Rafael L. Gomes UECE

DOI: https://doi.org/10.5753/sbrc.2026.19297

Abstract

The increasing sophistication of phishing attacks, driven by generative AI, limits the effectiveness of traditional rule-based detection methods. This work presents VerificAI, a hybrid system for real-time phishing detection in email and SMS messages, integrating LLMs, SLMs, Retrieval-Augmented Generation, URL validation, and Active Learning. The system allows users to submit suspicious messages to a chatbot that provides an automated and explainable response. Experiments using the Enron Spam and SMS Spam Collection datasets compare cloud-based and local models. The results show that cloud-based LLMs achieve F1-scores of up to 97%, while local SLMs deliver competitive performance with lower latency and enhanced privacy.

References

Afonso, P., Maia, E., Amorim, I., and Praça, I. (2025). Rethinking phishing detection: How dataset quality affects model generalization. In 2025 15th International Conference on Advanced Computer Information Technologies (ACIT), pages 542–547.

Brito, M. L. L., Ferreira, M. C. M., Portela, A. L. C., and Gomes, R. L. (2026). Ai-based estimation of bandwidth availability for data offloading in edge-cloud computing. IEEE Networking Letters, 8:69–73.

Costa, M. A., Costa, Y. M., Almeida, Y. O., Cardoso, F. J., and Gomes, R. L. (2024). Connection management using automated firewall based on threat intelligence. In Proceedings of the 2024 Latin America Networking Conference, LANC ’24, page 32–37, New York, NY, USA. Association for Computing Machinery.

Hasan, N., BusiReddyGari, P., Zhao, H., Ren, Y., Xu, J., and Zhang, S. (2025). Phishing email detection using large language models.

Koide, T., Fukushi, N., Nakano, H., and Chiba, D. (2024). Chatspamdetector: Leveraging large language models for effective phishing email detection.

Li, Z., Chen, W., and Zhang, H. (2025). Fedphishllm: A privacy-preserving and explainable phishing detection mechanism using federated learning and large language models. Journal of Cybersecurity and Privacy, 5(2):123–140.

Mahendru, S. and Pandit, T. (2024). Securenet: A comparative study of deberta and large language models for phishing detection. In 2024 IEEE 7th International Conference on Big Data and Artificial Intelligence (BDAI), page 160–169. IEEE.

Mendes, P., Maia, E., and Praça, I. (2025). Meajor corpus: A multi-source dataset for phishing email detection.

Pimenta, I., Silva, D., Moura, E., Silveira, M., and Gomes, R. L. (2024). Impact of data anonymization in machine learning models. In Proceedings of the 13th Latin-American Symposium on Dependable and Secure Computing, pages 188–191.

Pimenta, I. A., Lee, M. H., Bittencourt, L. F., and Gomes, R. L. (2025). Adaptive privacy based on mutual information for machine learning in edge-cloud environments. IEEE Networking Letters, pages 1–1.

Schmitt, M. and Flechais, I. (2024). Digital deception: generative artificial intelligence in social engineering and phishing. Artificial Intelligence Review, 57(12).

Souza, M. S., Ribeiro, S. E. S. B., Lima, V. C., Cardoso, F. J., and Gomes, R. L. (2024). Combining regular expressions and machine learning for sql injection detection in urban computing. Journal of Internet Services and Applications, 15(1):103–111.

Wang, Y., Tian, C., Hu, B., Yu, Y., Liu, Z., Zhang, Z., Zhou, J., Pang, L., and Wang, X. (2024). Can small language models be good reasoners for sequential recommendation? In Proceedings of the ACM Web Conference 2024, WWW ’24, page 3876–3887, New York, NY, USA. Association for Computing Machinery.

Wang, Y., Zhai, H., Wang, C., Hao, Q., Cohen, N. A., Foulger, R., Handler, J. A., and Wang, G. (2025). Can you walk me through it? explainable sms phishing detection using llm-based agents. In Proceedings of the Twenty-First USENIX Conference on Usable Privacy and Security, SOUPS ’25, USA. USENIX Association.

Xiao, S., Liu, Z., Zhang, P., Muennighoff, N., Lian, D., and Nie, J.-Y. (2024). C-pack: Packed resources for general chinese embeddings.

Xu, A., Yu, T., Du, M., Gundecha, P., Guo, Y., Zhu, X., Wang, M., Li, P., and Chen, X. (2024). Generative ai and retrieval-augmented generation (rag) systems for enterprise. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, CIKM ’24, page 5599–5602, New York, NY, USA. Association for Computing Machinery.

Detecção de Ataques de Phishing por Meio de Modelos de Linguagem

Abstract

References

Most read articles by the same author(s)