Application of Generative Networks in Spam Detection in Cybersecurity
Abstract
This study evaluates the effectiveness of Machine Learning techniques, including classical techniques (Naı̈ve Bayes, Random Forest, KNN, SVM, Logistic Regression) and Deep Learning models (BERT, RoBERTa), in Spam classification. Focusing on data poisoning attacks, it investigates the unethical use of popular generative networks, such as ChatGPT and Gemini, to create malicious messages capable of bypassing intelligent filters. The research also explores the potential of Dual Contrastive Learning to enhance detection capabilities and uses data from YouTube and Twitter.References
Bassiouni, M., Ali, M., and El-Dahshan, E. A. (2018). Ham and spam e-mails classification using machine learning techniques. Journal of Applied Security Research, 13:315–331.
Bhidya, M. (2019). Utkml’s twitter spam detection competition. Disponível em: [link]. Acesso em: 25 ago. 2023.
Biggio, B., Nelson, B., and Laskov, P. (2013). Poisoning attacks against support vector machines.
Bindu, P. V., Mishra, R., and Thilagam, P. S. (2018). Discovering spammer communities in twitter. Journal of Intelligent Information Systems, 51:503–527.
Chen, Q., Zhang, R., Zheng, Y., and Mao, Y. (2022a). Dual contrastive learning: Text classification via label-aware data augmentation.
Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Zhonghai, e.-V., Di Pietro, R., Jensen, C. D., and Meng, W. (2022b). Kallima: A clean-label framework for textual backdoor attacks. In Computer Security – ESORICS 2022, pages 447–466. Springer International Publishing.
Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., and Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6):e01802.
Derner, E. and Batistič, K. (2023). Beyond the safeguards: Exploring the security risks of chatgpt.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Hu, C. and Hu, Y.-H. F. (2020). Data poisoning on deep learning models. In 2020 International Conference on Computational Science and Computational Intelligence (CSCI), pages 628–632.
Hui, L. and Belkin, M. (2021). Evaluation of neural architectures trained with square loss vs cross-entropy in classification tasks.
Islam, R. and Moushi, O. M. (2024). Gpt-4o: The cutting-edge advancement in multimodal llm.
Janiesch, C., Zschech, P., and Heinrich, K. (2021). Machine learning and deep learning. Eletronic Markets, 31:685–695.
Li, J., Yang, Y., Wu, Z., Vydiswaran, V. G. V., and Xiao, C. (2023). Chatgpt as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger.
Lichman, M. (2017). Youtube spam collection data set. Disponível em: [link]. Acesso em: 23 ago. 2023.
NaliniPriya, G. and Asswini, M. (2015). A survey on vulnerable attacks in online social networks. International Confernce on Innovation Information in Computing Technologies, pages 1–6.
Rao, S., Verma, A. K., and Bhatia, T. (2021). A review on social spam detection: Challenges, open issues, and future directions. Expert Systems with Applications, 186.
Team, G. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv e-prints, page arXiv:2403.05530.
Utaliyeva, A., Pratiwi, M., Park, H., and Choi, Y.-H. (2023). Chatgpt: A threat to spam filtering systems. pages 1043–1050.
Wang, Q., Ma, Y., Zhao, K., and Tian, Y. (2020). A comprehensive survey of loss functions in machine learning. Annals of Data Science, pages 1–26.
Yerlikaya, F. A. and Şerif Bahtiyar (2022). Data poisoning attacks against machine learning algorithms. Expert Systems with Applications, 208:118101.
Zhang, X. and Ghorbani, A. A. (2020). An overview of online fake news: Characterization, detection, and discussion. Information Processing Management, 57.
Bhidya, M. (2019). Utkml’s twitter spam detection competition. Disponível em: [link]. Acesso em: 25 ago. 2023.
Biggio, B., Nelson, B., and Laskov, P. (2013). Poisoning attacks against support vector machines.
Bindu, P. V., Mishra, R., and Thilagam, P. S. (2018). Discovering spammer communities in twitter. Journal of Intelligent Information Systems, 51:503–527.
Chen, Q., Zhang, R., Zheng, Y., and Mao, Y. (2022a). Dual contrastive learning: Text classification via label-aware data augmentation.
Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Zhonghai, e.-V., Di Pietro, R., Jensen, C. D., and Meng, W. (2022b). Kallima: A clean-label framework for textual backdoor attacks. In Computer Security – ESORICS 2022, pages 447–466. Springer International Publishing.
Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., and Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6):e01802.
Derner, E. and Batistič, K. (2023). Beyond the safeguards: Exploring the security risks of chatgpt.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Hu, C. and Hu, Y.-H. F. (2020). Data poisoning on deep learning models. In 2020 International Conference on Computational Science and Computational Intelligence (CSCI), pages 628–632.
Hui, L. and Belkin, M. (2021). Evaluation of neural architectures trained with square loss vs cross-entropy in classification tasks.
Islam, R. and Moushi, O. M. (2024). Gpt-4o: The cutting-edge advancement in multimodal llm.
Janiesch, C., Zschech, P., and Heinrich, K. (2021). Machine learning and deep learning. Eletronic Markets, 31:685–695.
Li, J., Yang, Y., Wu, Z., Vydiswaran, V. G. V., and Xiao, C. (2023). Chatgpt as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger.
Lichman, M. (2017). Youtube spam collection data set. Disponível em: [link]. Acesso em: 23 ago. 2023.
NaliniPriya, G. and Asswini, M. (2015). A survey on vulnerable attacks in online social networks. International Confernce on Innovation Information in Computing Technologies, pages 1–6.
Rao, S., Verma, A. K., and Bhatia, T. (2021). A review on social spam detection: Challenges, open issues, and future directions. Expert Systems with Applications, 186.
Team, G. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv e-prints, page arXiv:2403.05530.
Utaliyeva, A., Pratiwi, M., Park, H., and Choi, Y.-H. (2023). Chatgpt: A threat to spam filtering systems. pages 1043–1050.
Wang, Q., Ma, Y., Zhao, K., and Tian, Y. (2020). A comprehensive survey of loss functions in machine learning. Annals of Data Science, pages 1–26.
Yerlikaya, F. A. and Şerif Bahtiyar (2022). Data poisoning attacks against machine learning algorithms. Expert Systems with Applications, 208:118101.
Zhang, X. and Ghorbani, A. A. (2020). An overview of online fake news: Characterization, detection, and discussion. Information Processing Management, 57.
Published
2025-09-01
How to Cite
ARAUJO, Milena de Toledo; COSTA, Kelton Augusto Pontara da.
Application of Generative Networks in Spam Detection in Cybersecurity. In: WORKSHOP ON SCIENTIFIC INITIATION AND UNDERGRADUATE WORKS - BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 147-158.
DOI: https://doi.org/10.5753/sbseg_estendido.2025.11696.
