Uma Nova Abordagem para Detecção de Cabeçalhos SMTP Falsos usando Aprendizado Profundo e Geração de Dados Sintéticos
Resumo
Este trabalho propõe uma nova abordagem para a detecção de cabeçalhos de e-mail anômalos, com foco em mensagens de phishing, spam e legítimas. Utilizamos um Perceptron Multicamadas (MLP) para classificação e uma Rede Adversária Generativa com Gradiente Penalizado (WGAN-GP) para geração de dados sintéticos. A técnica Gumbel Softmax é empregada para simular características de conjuntos de dados desbalanceados, e os dados gerados são avaliados por testes estatísticos. O Ray Tune é utilizado para otimização dos hiperparâmetros do modelo. Os resultados demonstram que a abordagem proposta melhora a acurácia e a capacidade de generalização na detecção de ameaças em cabeçalhos de e-mail.Referências
AbdulNabi, I. and Yaseen, Q. (2021). Spam email detection using deep learning techniques. Procedia Computer Science, 184:853–858. The 12th International Conference on Ambient Systems, Networks and Technologies (ANT) / The 4th International Conference on Emerging Data and Industry 4.0 (EDI40) / Affiliated Workshops.
Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein gan.
Beaman, C. and Isah, H. (2022). Anomaly detection in emails using machine learning and header information.
Bountakas, P., Koutroumpouchos, K., and Xenakis, C. (2021). A comparison of natural language processing and machine learning methods for phishing email detection. In Proceedings of the 16th International Conference on Availability, Reliability and Security, ARES ’21, New York, NY, USA. Association for Computing Machinery.
Cormack, G. V. and Lynam, T. R. (2005). Trec 2007 public corpus. Permission is granted for research use only. Publishing the corpus or any part of it is prohibited.
Dhanalakshmi, R., Vijayaraghavan, N., Kumar, A., and Prathiba, B. S. B. (2024). Ai-based detection and analysis of phishing domains: Leveraging machine learning for enhanced cybersecurity. In 2024 International Conference on System, Computation, Automation and Networking (ICSCAN), pages 1–6. IEEE.
Franchina, L., Ferracci, S., and Palmaro, F. (2021). Detecting phishing e-mails using text mining and features analysis. In Italian Conference on Cybersecurity.
Greco, M., Chang, R., and Galdames, P. (2024). Educational phishing: An awareness campaign to learn how to detect phishing. In 2024 43rd International Conference of the Chilean Computer Science Society (SCCC), pages 1–5. IEEE.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773.
Guan, S. (2023). Performance analysis of convolutional neural networks and multilayer perceptron in generative adversarial networks. In 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), pages 817–821.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. (2017). Improved training of wasserstein gans.
Gupta, S., Pritwani, M., Shrivastava, A., Moharir, M., AR, A. K., et al. (2024). A comprehensive analysis of social engineering attacks: From phishing to prevention-tools, techniques and strategies. In 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), pages 1–8. IEEE.
II, J. T. W. (2023). headerparser: argparse for mail-style headers. Biblioteca Python.
Karim, A., Azam, S., Shanmugam, B., and Kannoorpatti, K. (2020). Efficient clustering of emails into spam and ham: The foundational study of a comprehensive unsupervised framework. IEEE Access, 8:154759–154788.
Kaushik, N., Rathore, T. S., and Kumar, P. (2024). Email traceback: Securing systems from phishing and malicious link prevention. In 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N), pages 647–652. IEEE.
Kulkarni, M., Kumar, S., Panjwani, Y., Moharir, M., Kumar, A. A., Baskaran, E., et al. (2024). Mitigating email phishing: analytical framework, simulation models, and preventive measures. In 2024 10th international conference on communication and signal processing (ICCSP), pages 1459–1464. IEEE.
Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., and Stoica, I. (2018). Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
Lopez-Paz, D. and Oquab, M. (2018). Revisiting classifier two-sample tests.
Luo, E., Young, L., Ho, G., Afifi, M., Schweighauser, M., Katz-Bassett, E., and Cidon, A. (2025). Characterizing the networks sending enterprise phishing emails. In International Conference on Passive and Active Network Measurement, pages 437–466. Springer.
Maddison, C. J., Mnih, A., and Teh, Y. W. (2016). The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712.
Nazario, J. (2006). Phishingcorpus homepage. Recuperado em Junho 2024.
Shahila, D. F. D., Rosi, A., Stephen, V., et al. (2024). Ai based phishing discrement for immense e-maildata. In 2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT), volume 1, pages 270–277. IEEE.
Wosah, P. N., Ali Mirza, Q., and Sayers, W. (2024). Analysing the email data using stylometric method and deep learning to mitigate phishing attack. International Journal of Information Technology, pages 1–12.
Yilmaz, I., Masum, R., and Siraj, A. (2020). Addressing imbalanced data problem with generative adversarial network for intrusion detection. In 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), pages 25–30.
Zhou, T., Wu, H.-T., Lu, H., Xu, P., and Cheung, Y.-M. (2022). Password guessing based on gan with gumbel-softmax. Security and Communication Networks, 2022(1):5670629.
Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein gan.
Beaman, C. and Isah, H. (2022). Anomaly detection in emails using machine learning and header information.
Bountakas, P., Koutroumpouchos, K., and Xenakis, C. (2021). A comparison of natural language processing and machine learning methods for phishing email detection. In Proceedings of the 16th International Conference on Availability, Reliability and Security, ARES ’21, New York, NY, USA. Association for Computing Machinery.
Cormack, G. V. and Lynam, T. R. (2005). Trec 2007 public corpus. Permission is granted for research use only. Publishing the corpus or any part of it is prohibited.
Dhanalakshmi, R., Vijayaraghavan, N., Kumar, A., and Prathiba, B. S. B. (2024). Ai-based detection and analysis of phishing domains: Leveraging machine learning for enhanced cybersecurity. In 2024 International Conference on System, Computation, Automation and Networking (ICSCAN), pages 1–6. IEEE.
Franchina, L., Ferracci, S., and Palmaro, F. (2021). Detecting phishing e-mails using text mining and features analysis. In Italian Conference on Cybersecurity.
Greco, M., Chang, R., and Galdames, P. (2024). Educational phishing: An awareness campaign to learn how to detect phishing. In 2024 43rd International Conference of the Chilean Computer Science Society (SCCC), pages 1–5. IEEE.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773.
Guan, S. (2023). Performance analysis of convolutional neural networks and multilayer perceptron in generative adversarial networks. In 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), pages 817–821.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. (2017). Improved training of wasserstein gans.
Gupta, S., Pritwani, M., Shrivastava, A., Moharir, M., AR, A. K., et al. (2024). A comprehensive analysis of social engineering attacks: From phishing to prevention-tools, techniques and strategies. In 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), pages 1–8. IEEE.
II, J. T. W. (2023). headerparser: argparse for mail-style headers. Biblioteca Python.
Karim, A., Azam, S., Shanmugam, B., and Kannoorpatti, K. (2020). Efficient clustering of emails into spam and ham: The foundational study of a comprehensive unsupervised framework. IEEE Access, 8:154759–154788.
Kaushik, N., Rathore, T. S., and Kumar, P. (2024). Email traceback: Securing systems from phishing and malicious link prevention. In 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N), pages 647–652. IEEE.
Kulkarni, M., Kumar, S., Panjwani, Y., Moharir, M., Kumar, A. A., Baskaran, E., et al. (2024). Mitigating email phishing: analytical framework, simulation models, and preventive measures. In 2024 10th international conference on communication and signal processing (ICCSP), pages 1459–1464. IEEE.
Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., and Stoica, I. (2018). Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
Lopez-Paz, D. and Oquab, M. (2018). Revisiting classifier two-sample tests.
Luo, E., Young, L., Ho, G., Afifi, M., Schweighauser, M., Katz-Bassett, E., and Cidon, A. (2025). Characterizing the networks sending enterprise phishing emails. In International Conference on Passive and Active Network Measurement, pages 437–466. Springer.
Maddison, C. J., Mnih, A., and Teh, Y. W. (2016). The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712.
Nazario, J. (2006). Phishingcorpus homepage. Recuperado em Junho 2024.
Shahila, D. F. D., Rosi, A., Stephen, V., et al. (2024). Ai based phishing discrement for immense e-maildata. In 2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT), volume 1, pages 270–277. IEEE.
Wosah, P. N., Ali Mirza, Q., and Sayers, W. (2024). Analysing the email data using stylometric method and deep learning to mitigate phishing attack. International Journal of Information Technology, pages 1–12.
Yilmaz, I., Masum, R., and Siraj, A. (2020). Addressing imbalanced data problem with generative adversarial network for intrusion detection. In 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), pages 25–30.
Zhou, T., Wu, H.-T., Lu, H., Xu, P., and Cheung, Y.-M. (2022). Password guessing based on gan with gumbel-softmax. Security and Communication Networks, 2022(1):5670629.
Publicado
01/09/2025
Como Citar
TAVARES, Patrick M.; MASCARENHAS, Dalbert M..
Uma Nova Abordagem para Detecção de Cabeçalhos SMTP Falsos usando Aprendizado Profundo e Geração de Dados Sintéticos. In: SIMPÓSIO BRASILEIRO DE CIBERSEGURANÇA (SBSEG), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 921-937.
DOI: https://doi.org/10.5753/sbseg.2025.10418.
