PhishFL: Uma solução Federada para Detecção de Phishing Baseada em BERT

Lucca F. T. Nolasco; Andher Paulo C. Santana; Rodolfo I. Meneguette; Vinícius P. Gonçalves; André Luiz M. Serrano; Geraldo P. Rocha Filho

doi:10.5753/sbrc_estendido.2026.23182

Lucca F. T. Nolasco UESB
Andher Paulo C. Santana UnB
Rodolfo I. Meneguette USP
Vinícius P. Gonçalves USP
André Luiz M. Serrano USP
Geraldo P. Rocha Filho UESB

DOI: https://doi.org/10.5753/sbrc_estendido.2026.23182

Resumo

Este trabalho aborda o problema da detecção automática de e-mails de phishing, que representa uma das principais ameaças à segurança digital. Métodos tradicionais de detecção, baseados em regras estáticas ou em dados centralizados, apresentam limitações quanto à adaptação e à preservação da privacidade dos usuários. Diante disso, é proposto o PhishFL, uma solução que utiliza Federated Learning (FL) para o treinamento distribuído de modelos de classificação textual, permitindo que múltiplos clientes colaborem na construção de um modelo global sem o compartilhamento direto de dados sensíveis. Os resultados demonstram que o modelo federado é capaz de realizar a detecção de e-mails de phishing com desempenho competitivo em relação à abordagem centralizada. Ainda, observa-se que o aumento do número de clientes impacta diretamente a estabilidade e a acurácia do modelo, evidenciando o trade-off entre desempenho e privacidade no contexto de FL.

Referências

Alhuzali, A., Alloqmani, A., Aljabri, M., and Alharbi, F. (2025). In-depth analysis of phishing email detection: Evaluating the performance of machine learning and deep learning models across multiple datasets. Applied Sciences, 15(6):3396.

Andrade, C. A., Rocha Filho, G. P., Meneguette, R. I., Maranhão, J. P. A., Sant’Ana, R., Duarte, J. C., Serrano, A. L. M., and Gonçalves, V. P. (2024). Fortunate: Decrypting and classifying malware by variable length instruction sequences. In 2024 IEEE 13th International Conference on Cloud Networking (CloudNet), pages 1–9. IEEE.

Bhargava, P., Drozd, A., and Rogers, A. (2021). Generalization in nli: Ways (not) to go beyond simple heuristics.

Chakraborty, S. (2023). Phishing email detection.

Chinta, P. C. R., Moore, C. S., Karaka, L. M., Sakuru, M., Bodepudi, V., and Maka, S. R. (2025). Building an intelligent phishing email detection system using machine learning and feature engineering. European Journal of Applied Science, Engineering and Technology, 3(2):41–54.

Cuchta, T., Blackwood, B., Devine, T. R., Niichel, R. J., Daniels, K. M., Lutjens, C. H., Maibach, S., and Stephenson, R. J. (2019). Human risk factors in cybersecurity. In Proceedings of the 20th Annual SIG Conference on Information Technology Education, SIGITE ’19, page 87–92, New York, NY, USA. Association for Computing Machinery.

de Andrade, C. A. B., Rocha Filho, G. P., Meneguette, R. I., Sant’Ana, R., Duarte, J. C., Serrano, A. L. M., Neumann, C., and Gonçalves, V. P. (2025). Forensics: Deciphering and detecting malware through variable-length instruction sequences. Journal of Internet Services and Applications, 16(1).

de Oliveira, J. A., Gonçalves, V. P., Meneguette, R. I., de Sousa Jr, R. T., Guidoni, D. L., Oliveira, J. C., and Rocha Filho, G. P. (2023). F-nids—a network intrusion detection system based on federated learning. Computer Networks, 236:110010.

Junnarkar, A., Adhikari, S., Fagania, J., Chimurkar, P., and Karia, D. (2021). E-mail spam classification via machine learning and natural language processing. In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), pages 693–699.

Livara, A. and Hernandez, R. (2022). An empirical analysis of machine learning techniques in phishing e-mail detection. In 2022 International Conference for Advancement in Technology (ICONAT), pages 1–6.

Mammen, P. M. (2021). Federated learning: Opportunities and challenges.

Rathee, D. and Mann, S. (2022). Detection of e-mail phishing attacks - using machine learning and deep learning. International Journal of Computer Applications, 14:513–535.

Serrano, A. L. M., Rodrigues, G. A. P., Rocha Filho, G. P., Gonçalves, V. P., Bonacin, R., Bispo, G. D., Peixoto, M. G. M., and Meneguette, R. I. (2026). Efficient and lightweight phishing detection: A case for sustainable cybersecurity with tf-idf and lightgbm. IEEE Access, 14:55458–55471.

Turc, I., Chang, M., Lee, K., and Toutanova, K. (2019). Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR, abs/1908.08962.

Wang, Z., Sun, L., and Zhu, H. (2020). Defining social engineering in cybersecurity. IEEE Access, 8:85094–85115.