AutoPhish: A Grammar-based AutoML Approach to Learn Classifiers for Phishing Detection

João Guilherme Miranda; Mateus L. S. D. Barros; Tapas Si; Carlo Marcelo R. Silva; Péricles B. C. Miranda

doi:10.5753/sbseg.2025.9658

João Guilherme Miranda UFRPE
Mateus L. S. D. Barros UFRPE
Tapas Si Bankura Unnayani Institute of Engineering
Carlo Marcelo R. Silva UPE
Péricles B. C. Miranda UFRPE

DOI: https://doi.org/10.5753/sbseg.2025.9658

Resumo

A crescente sofisticação das ameaças cibernéticas — em especial o phishing — exige mecanismos de detecção avançados e adaptativos para proteger usuários e organizações. As defesas tradicionais têm dificuldade em acompanhar a evolução de técnicas como Clone Phishing, Spear Phishing, DNSBased Phishing e ataques do tipo Man-In-The-Middle. Pesquisas recentes têm aplicado extensivamente modelos de aprendizado de máquina (ML) à detecção de phishing, destacando a importância da seleção de atributos e da otimização dos classificadores. Embora abordagens baseadas em sistemas de regras, modelos de comitê (ensemble) e redes neurais artificiais (ANNs) tenham apresentado resultados promissores, a dependência de conjuntos de dados estáticos e modelos generalistas limita sua eficácia em cenários dinâmicos e reais. Este artigo propõe o AutoPhish, uma nova abordagem para detecção de phishing baseada em Evolução Gramatical (GE). A GE é um método de programação genética orientado por gramática, capaz de gerar classificadores personalizados e otimizados. O AutoPhish implementa uma GE com objetivo único, visando gerar classificadores que maximizem o F1−score. Nossa abordagem é avaliada em cenários realistas com conjuntos de dados desbalanceados e comparada com algoritmos tradicionais de aprendizado de máquina, utilizando métricas de desempenho como acurácia, revocação, precisão e F1−score. Os resultados mostram que os classificadores evoluídos com o AutoPhish superam consistentemente a maioria dos métodos de base, demonstrando alto potencial para aplicação prática. Este estudo destaca o valor da computação evolutiva na cibersegurança e impulsiona o desenvolvimento de sistemas de detecção de phishing adaptativos e de alto desempenho.

Referências

Barros, M., Silva, C., and de Miranda, P. (2019). Adoção da seleção de características como mecanismo antiphishing: aplicabilidade e impactos. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 214–225. SBC.

Basgalupp, M. P., Barros, R. C., Cerri, R., Neri, F., Miranda, P. B., and Ludermir, T. (2025). Grammar-based evolutionary approaches for software effort estimation. In 2025 IEEE Congress on Evolutionary Computation (CEC), pages 1–4. IEEE.

Bountakas, P., Koutroumpouchos, K., and Xenakis, C. (2021). A comparison of natural language processing and machine learning methods for phishing email detection. Proceedings of the 16th International Conference on Availability, Reliability and Security.

da Silva, C. A., Miranda, P. B., and Cordeiro, F. R. (2021). A new grammar for creating convolutional neural networks applied to medical image classification. In 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 97–104. IEEE.

da Silva, C. A., Rosa, D. C., Miranda, P. B., Cordeiro, F. R., Si, T., Nascimento, A. C., Mello, R. F., and de Mattos Neto, P. S. (2023a). A novel multi-objective grammar-based framework for the generation of convolutional neural networks. Expert Systems With Applications, 212:118670.

da Silva, C. A., Rosa, D. C., Miranda, P. B., Si, T., Cerri, R., and Basgalupp, M. (2023b). Automated cnn optimization using multi-objective grammatical evolution. Applied Soft Computing, page 111124.

de Barros, M., da Silva, C., and de Miranda, P. (2019). Aplicabilidade e impactos quanto a adoção de modelos de classificação como mecanismos anti-phishing. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), pages 39–42. SBC.

de Barros, M. L., da Silva, C. M., and de Miranda, P. B. (2020). Xphide: Um sistema especialista para a detecçao de phishing. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), pages 161–174. SBC.

Diniz, J. B., Cordeiro, F. R., Miranda, P. B., and da Silva, L. A. T. (2018). A grammar-based genetic programming approach to optimize convolutional neural network architectures. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 82–93. SBC.

Fadheel, W., Abusharkh, M., and Abdel-Qader, I. (2017). On feature selection for the prediction of phishing websites. 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), pages 871–876.

Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S., Hemberg, E., and O’Neill, M. (2017). Ponyge2: Grammatical evolution in python. In Proceedings of the genetic and evolutionary computation conference companion, pages 1194–1201.

Fette, I., Sadeh, N., and Tomasic, A. (2007). Learning to detect phishing emails. In Proceedings of the 16th international conference on World Wide Web, pages 649–656.

Habib, P., Sharma, U., and Sethi, K. (2022). Phishing detection with machine learning. International Journal for Research in Applied Science and Engineering Technology.

Hasan, K. M. Z., Hasan, M. Z., and Zahan, N. (2019). Automated prediction of phishing websites using deep convolutional neural network. 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), pages 1–4.

Lingam, G., Rout, R. R., Somayajulu, D., and Ghosh, S. (2021). Particle swarm optimization on deep reinforcement learning for detecting social spam bots and spam-influential users in twitter network. IEEE Systems Journal, 15:2281–2292.

Mahajan, R. and Siddavatam, I. A. (2018). Phishing website detection using machine learning algorithms. International Journal of Computer Applications.

Miranda, P. B. and Prudêncio, R. B. (2020). A novel context-free grammar for the generation of pso algorithms. Natural Computing, 19(3):495–513.

Miranda, P. B., Prudêncio, R. B., and Pappa, G. L. (2017). H3ad: A hybrid hyper-heuristic for algorithm design. Information Sciences, 414:340–354.

O’Neill, M. and Ryan, C. (2004). Grammatical evolution by grammatical evolution: The evolution of grammar and genetic code. In European Conference on Genetic Programming, pages 138–149. Springer.

Peng, T., Harris, I., and Sawa, Y. (2018). Detecting phishing attacks using natural language processing and machine learning. 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pages 300–301.

Si, T., Miranda, P., Galdino, J. V., and Nascimento, A. (2021). Grammar-based automatic programming for medical data classification: an experimental study. Artificial Intelligence Review, 54:4097–4135.

Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1):168–192.

Yadollahi, M. M., Shoeleh, F., Serkani, E., Madani, A., and Gharaee, H. (2019). An adaptive machine learning based approach for phishing detection using hybrid features. 2019 5th International Conference on Web Research (ICWR), pages 281–286.

Yaswanth, P. and Nagaraju, V. (2023). Prediction of phishing sites in network using naive bayes compared over random forest with improved accuracy. 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), pages 1–5.

Zamir, A., Khan, H., Iqbal, T., Yousaf, N., Aslam, F., Anjum, M. A., and Hamdani, M. (2020). Phishing web site detection using diverse machine learning algorithms. Electron. Libr., 38:65–80.

Zhu, E., Ye, C., Liu, D., Liu, F., Wang, F., and Li, X. (2018). An effective neural network phishing detection model based on optimal feature selection. 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pages 781–787.

AutoPhish: A Grammar-based AutoML Approach to Learn Classifiers for Phishing Detection

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)