Evaluation of Generative Data Models and Artificial Neural Networks for Fake News Classification on the Web

  • Daniela Deboni Silva de Mello IFB
  • Gabriela Barbosa Oliveira IFB
  • João Gabriel Rocha Silva IFB

Abstract


This work investigates the use of synthetic data generated by generative modeling techniques combined with artificial neural networks to improve fake news detection. Considering the limitations of real datasets, the proposal aims to expand and diversify the training base. Although the accuracies presented are moderate, this is common in fake news detection problems due to the complexity of the topic. The results indicate that the use of synthetic data contributes to enhancing the classifier’s performance, making the approach promising for the automatic identification of false news and for strengthening informational reliability.

References

Almeida, A. L. D., Carrara, G. C., Prates, I. B., Nascimento, L. C., Souza, P. H., Almeida, T. R., Cani, R. C., and Silva, J. G. R. (2021). Modelo matemático apoiado por um algoritmo genético para classificaçao de fake news na web. In Encontro Nacional de Computação dos Institutos Federais (ENCompIF), pages 17–20. SBC.

Chaves, V. M., Silva, M. L. P., Brum, F. O., and Silva, J. G. R. (2023). Estudo comparativo entre um algoritmo de evolução diferencial e um algoritmo genético para classificação de fake news na web. In Encontro Nacional de Computação dos Institutos Federais (ENCompIF), pages 17–20. SBC.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.

Chollet, F. (2018). Deep Learning with Python. Manning Publications.

De Andrade Júnior, W. T. and outros (2024). Avaliando o desempenho de modelos generativos de dados para classificação de notícias falsas. In Encontro Nacional de Computação dos Institutos Federais (ENCompIF), pages 42–49. SBC.

Farnia, F. and Ozdaglar, A. (2020). Do gans always have nash equilibria? pages 3029–3039.

Gholamy, A., Kreinovich, V., and Kosheleva, O. (2018). Why 70/30 or 80/20 relation between training and testing sets: A pedagogical explanation. Technical Report UTEP-CS-18-09, University of Texas at El Paso.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pages 2672–2680.

Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239.

Horne, B. and Adali, S. (2017). This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the International AAAI Conference on Web and Social Media, pages 759–766.

Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

Kito, M. H. I. (2019). Uma abordagem de ciência de dados para identificar fake news no âmbito político.

Leão, E. T. and Figueiredo, R. C. d. (2022). Um estudo comparativo sobre redes adversárias generativas.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444.

Lu, Y., Shen, M., Wang, H., Wang, X., van Rechem, C., Fu, T., and Wei, W. (2023). Machine learning for synthetic data generation: A review. arXiv preprint arXiv:2302.04062.

Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.

Mozó, P. O. et al. (2022). A practical introduction to generative adversarial networks and variational autoencoders with python examples. In International Conference on Artificial Intelligence and Soft Computing, pages 244–253. Springer.

Nair, V. and Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814.

Nogueira, A. G. D. and outros (2024). Geração de dados sintéticos tabulares para detecção de malware android: um estudo de caso. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), pages 808–814. SBC.

Patel, P. (2024). Synthetic data. Business Information Review, 41(2):48–52.

Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation, volume 2.

Prado, M. (2022). Fake news e inteligência artificial: o poder dos algoritmos na guerra da desinformação. Edições 70.

Przybyła-Kasperek, M. and Marfo, S. (2024). A multi-layer perceptron neural network for varied conditional attributes in tabular dispersed data. PLOS ONE, 19(4):e0281234.

Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1):22–36.

Smith, J., Silva, M., and Oliveira, C. (2023). Hybrid synthetic data generation for enhanced machine learning. Journal of Artificial Intelligence Research, 78:123–145.

Watanuki, S., Edo, K., and Miura, T. (2024). Applying deep generative neural networks to data augmentation for consumer survey data with a small sample size. Applied Sciences, 14(19):9030.

Weng, L. (2021). What are diffusion models? Accessed: 2025-07-20.
Published
2025-10-16
MELLO, Daniela Deboni Silva de; OLIVEIRA, Gabriela Barbosa; SILVA, João Gabriel Rocha. Evaluation of Generative Data Models and Artificial Neural Networks for Fake News Classification on the Web. In: REGIONAL SCHOOL OF INFORMATICS OF ESPÍRITO SANTO (ERI-ES), 10. , 2025, Espírito Santo/ES. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1-10. DOI: https://doi.org/10.5753/eries.2025.15179.