AutoPhish: A Grammar-based AutoML Approach to Learn Classifiers for Phishing Detection
Abstract
The increasing sophistication of cyber threats—particularly phishing—demands advanced and adaptive detection mechanisms to protect users and organizations. Traditional defenses struggle to keep pace as phishing techniques such as Clone Phishing, Spear Phishing, DNS-Based Phishing, and Man-In-The-Middle attacks evolve. Recent research has extensively applied machine learning (ML) models to phishing detection, emphasizing the importance of attribute selection and classifier optimization. While approaches using rule-based systems, ensemble models, and artificial neural networks (ANNs) have shown promising results, the reliance on static datasets and generalized models limits their effectiveness in dynamic, real-world scenarios. This article proposes AutoPhish, a novel phishing detection approach based on Grammatical Evolution (GE). GE is a grammar-driven genetic programming method capable of generating customized and optimized classifiers. AutoPhish implements a single objective GE aiming to produce classifiers which maximize F1−score. Our method is evaluated in realistic settings with imbalanced datasets and compared to traditional machine learning algorithms using performance metrics such as accuracy, recall, precision, and F1−score. The results show that classifiers evolved using AutoPhish consistently outperform most baseline methods, demonstrating strong potential for practical deployment. This study underscores the value of evolutionary computation in cybersecurity and advances the development of adaptive, high-performance phishing detection systems.References
Barros, M., Silva, C., and de Miranda, P. (2019). Adoção da seleção de características como mecanismo antiphishing: aplicabilidade e impactos. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 214–225. SBC.
Basgalupp, M. P., Barros, R. C., Cerri, R., Neri, F., Miranda, P. B., and Ludermir, T. (2025). Grammar-based evolutionary approaches for software effort estimation. In 2025 IEEE Congress on Evolutionary Computation (CEC), pages 1–4. IEEE.
Bountakas, P., Koutroumpouchos, K., and Xenakis, C. (2021). A comparison of natural language processing and machine learning methods for phishing email detection. Proceedings of the 16th International Conference on Availability, Reliability and Security.
da Silva, C. A., Miranda, P. B., and Cordeiro, F. R. (2021). A new grammar for creating convolutional neural networks applied to medical image classification. In 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 97–104. IEEE.
da Silva, C. A., Rosa, D. C., Miranda, P. B., Cordeiro, F. R., Si, T., Nascimento, A. C., Mello, R. F., and de Mattos Neto, P. S. (2023a). A novel multi-objective grammar-based framework for the generation of convolutional neural networks. Expert Systems With Applications, 212:118670.
da Silva, C. A., Rosa, D. C., Miranda, P. B., Si, T., Cerri, R., and Basgalupp, M. (2023b). Automated cnn optimization using multi-objective grammatical evolution. Applied Soft Computing, page 111124.
de Barros, M., da Silva, C., and de Miranda, P. (2019). Aplicabilidade e impactos quanto a adoção de modelos de classificação como mecanismos anti-phishing. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), pages 39–42. SBC.
de Barros, M. L., da Silva, C. M., and de Miranda, P. B. (2020). Xphide: Um sistema especialista para a detecçao de phishing. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), pages 161–174. SBC.
Diniz, J. B., Cordeiro, F. R., Miranda, P. B., and da Silva, L. A. T. (2018). A grammar-based genetic programming approach to optimize convolutional neural network architectures. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 82–93. SBC.
Fadheel, W., Abusharkh, M., and Abdel-Qader, I. (2017). On feature selection for the prediction of phishing websites. 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), pages 871–876.
Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S., Hemberg, E., and O’Neill, M. (2017). Ponyge2: Grammatical evolution in python. In Proceedings of the genetic and evolutionary computation conference companion, pages 1194–1201.
Fette, I., Sadeh, N., and Tomasic, A. (2007). Learning to detect phishing emails. In Proceedings of the 16th international conference on World Wide Web, pages 649–656.
Habib, P., Sharma, U., and Sethi, K. (2022). Phishing detection with machine learning. International Journal for Research in Applied Science and Engineering Technology.
Hasan, K. M. Z., Hasan, M. Z., and Zahan, N. (2019). Automated prediction of phishing websites using deep convolutional neural network. 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), pages 1–4.
Lingam, G., Rout, R. R., Somayajulu, D., and Ghosh, S. (2021). Particle swarm optimization on deep reinforcement learning for detecting social spam bots and spam-influential users in twitter network. IEEE Systems Journal, 15:2281–2292.
Mahajan, R. and Siddavatam, I. A. (2018). Phishing website detection using machine learning algorithms. International Journal of Computer Applications.
Miranda, P. B. and Prudêncio, R. B. (2020). A novel context-free grammar for the generation of pso algorithms. Natural Computing, 19(3):495–513.
Miranda, P. B., Prudêncio, R. B., and Pappa, G. L. (2017). H3ad: A hybrid hyper-heuristic for algorithm design. Information Sciences, 414:340–354.
O’Neill, M. and Ryan, C. (2004). Grammatical evolution by grammatical evolution: The evolution of grammar and genetic code. In European Conference on Genetic Programming, pages 138–149. Springer.
Peng, T., Harris, I., and Sawa, Y. (2018). Detecting phishing attacks using natural language processing and machine learning. 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pages 300–301.
Si, T., Miranda, P., Galdino, J. V., and Nascimento, A. (2021). Grammar-based automatic programming for medical data classification: an experimental study. Artificial Intelligence Review, 54:4097–4135.
Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1):168–192.
Yadollahi, M. M., Shoeleh, F., Serkani, E., Madani, A., and Gharaee, H. (2019). An adaptive machine learning based approach for phishing detection using hybrid features. 2019 5th International Conference on Web Research (ICWR), pages 281–286.
Yaswanth, P. and Nagaraju, V. (2023). Prediction of phishing sites in network using naive bayes compared over random forest with improved accuracy. 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), pages 1–5.
Zamir, A., Khan, H., Iqbal, T., Yousaf, N., Aslam, F., Anjum, M. A., and Hamdani, M. (2020). Phishing web site detection using diverse machine learning algorithms. Electron. Libr., 38:65–80.
Zhu, E., Ye, C., Liu, D., Liu, F., Wang, F., and Li, X. (2018). An effective neural network phishing detection model based on optimal feature selection. 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pages 781–787.
Basgalupp, M. P., Barros, R. C., Cerri, R., Neri, F., Miranda, P. B., and Ludermir, T. (2025). Grammar-based evolutionary approaches for software effort estimation. In 2025 IEEE Congress on Evolutionary Computation (CEC), pages 1–4. IEEE.
Bountakas, P., Koutroumpouchos, K., and Xenakis, C. (2021). A comparison of natural language processing and machine learning methods for phishing email detection. Proceedings of the 16th International Conference on Availability, Reliability and Security.
da Silva, C. A., Miranda, P. B., and Cordeiro, F. R. (2021). A new grammar for creating convolutional neural networks applied to medical image classification. In 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 97–104. IEEE.
da Silva, C. A., Rosa, D. C., Miranda, P. B., Cordeiro, F. R., Si, T., Nascimento, A. C., Mello, R. F., and de Mattos Neto, P. S. (2023a). A novel multi-objective grammar-based framework for the generation of convolutional neural networks. Expert Systems With Applications, 212:118670.
da Silva, C. A., Rosa, D. C., Miranda, P. B., Si, T., Cerri, R., and Basgalupp, M. (2023b). Automated cnn optimization using multi-objective grammatical evolution. Applied Soft Computing, page 111124.
de Barros, M., da Silva, C., and de Miranda, P. (2019). Aplicabilidade e impactos quanto a adoção de modelos de classificação como mecanismos anti-phishing. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), pages 39–42. SBC.
de Barros, M. L., da Silva, C. M., and de Miranda, P. B. (2020). Xphide: Um sistema especialista para a detecçao de phishing. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), pages 161–174. SBC.
Diniz, J. B., Cordeiro, F. R., Miranda, P. B., and da Silva, L. A. T. (2018). A grammar-based genetic programming approach to optimize convolutional neural network architectures. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 82–93. SBC.
Fadheel, W., Abusharkh, M., and Abdel-Qader, I. (2017). On feature selection for the prediction of phishing websites. 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), pages 871–876.
Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S., Hemberg, E., and O’Neill, M. (2017). Ponyge2: Grammatical evolution in python. In Proceedings of the genetic and evolutionary computation conference companion, pages 1194–1201.
Fette, I., Sadeh, N., and Tomasic, A. (2007). Learning to detect phishing emails. In Proceedings of the 16th international conference on World Wide Web, pages 649–656.
Habib, P., Sharma, U., and Sethi, K. (2022). Phishing detection with machine learning. International Journal for Research in Applied Science and Engineering Technology.
Hasan, K. M. Z., Hasan, M. Z., and Zahan, N. (2019). Automated prediction of phishing websites using deep convolutional neural network. 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), pages 1–4.
Lingam, G., Rout, R. R., Somayajulu, D., and Ghosh, S. (2021). Particle swarm optimization on deep reinforcement learning for detecting social spam bots and spam-influential users in twitter network. IEEE Systems Journal, 15:2281–2292.
Mahajan, R. and Siddavatam, I. A. (2018). Phishing website detection using machine learning algorithms. International Journal of Computer Applications.
Miranda, P. B. and Prudêncio, R. B. (2020). A novel context-free grammar for the generation of pso algorithms. Natural Computing, 19(3):495–513.
Miranda, P. B., Prudêncio, R. B., and Pappa, G. L. (2017). H3ad: A hybrid hyper-heuristic for algorithm design. Information Sciences, 414:340–354.
O’Neill, M. and Ryan, C. (2004). Grammatical evolution by grammatical evolution: The evolution of grammar and genetic code. In European Conference on Genetic Programming, pages 138–149. Springer.
Peng, T., Harris, I., and Sawa, Y. (2018). Detecting phishing attacks using natural language processing and machine learning. 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pages 300–301.
Si, T., Miranda, P., Galdino, J. V., and Nascimento, A. (2021). Grammar-based automatic programming for medical data classification: an experimental study. Artificial Intelligence Review, 54:4097–4135.
Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1):168–192.
Yadollahi, M. M., Shoeleh, F., Serkani, E., Madani, A., and Gharaee, H. (2019). An adaptive machine learning based approach for phishing detection using hybrid features. 2019 5th International Conference on Web Research (ICWR), pages 281–286.
Yaswanth, P. and Nagaraju, V. (2023). Prediction of phishing sites in network using naive bayes compared over random forest with improved accuracy. 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), pages 1–5.
Zamir, A., Khan, H., Iqbal, T., Yousaf, N., Aslam, F., Anjum, M. A., and Hamdani, M. (2020). Phishing web site detection using diverse machine learning algorithms. Electron. Libr., 38:65–80.
Zhu, E., Ye, C., Liu, D., Liu, F., Wang, F., and Li, X. (2018). An effective neural network phishing detection model based on optimal feature selection. 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pages 781–787.
Published
2025-09-01
How to Cite
MIRANDA, João Guilherme; BARROS, Mateus L. S. D.; SI, Tapas; SILVA, Carlo Marcelo R.; MIRANDA, Péricles B. C..
AutoPhish: A Grammar-based AutoML Approach to Learn Classifiers for Phishing Detection. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 180-194.
DOI: https://doi.org/10.5753/sbseg.2025.9658.
