Evaluating the Performance of Generative Data Models for Fake News Classification
Abstract
This paper aimed to investigate the potential of models to generate synthetic data to improve fake news detection. The research compares the results obtained from a real dataset, containing news information, with those obtained from four synthetic datasets generated using GAN, VAE, DDPM and SMOTE. The study results indicate that classification performance improved when using artificial data, with an accuracy score of approximately 87%. These results suggest that synthetic data can be a valuable tool for improving fake news classification performance.
References
Assefa, S. A., Dervovic, D., Mahfouz, M., Tillman, R. E., Reddy, P., and Veloso, M. (2020). Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance, pages 1–8.
Carrillo-Perez, F., Pizurica, M., Zheng, Y., Nandi, T. N., Madduri, R., Shen, J., and Gevaert, O. (2023). Rna-to-image multi-cancer synthesis using cascaded diffusion models. bioRxiv.
Ferreira, A. L. N., Nascimento, D. G., Basílio, S. C. A., and Silva, J. G. R. (2020). Um modelo matemático para classificação de fake news na web. In Anais do Simpósio Brasileiro de Pesquisa Operacional.
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., and Greenspan, H. (2018). Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing, 321:321–331.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2020). Generative adversarial networks. Commun. ACM, 63(11):139–144.
Horne, B. and Adali, S. (2017). This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. Proceedings of the International AAAI Conference on Web and Social Media, 11(1):759–766.
Kingma, D. P. and Welling, M. (2022). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Kotelnikov, A., Baranchuk, D., Rubachev, I., and Babenko, A. (2023). Tabddpm: Modelling tabular data with diffusion models. In International Conference on Machine Learning, pages 17564–17579. PMLR.
Lu, Y., Wang, H., and Wei, W. (2023). Machine learning for synthetic data generation: a review. arXiv preprint arXiv:2302.04062.
Mukherjee, M. and Khushi, M. (2021). Smote-enc: A novel smote-based method to generate synthetic data for nominal and continuous features. Applied System Innovation, 4(1):18.
Nichol, A. Q. and Dhariwal, P. (2021). Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., and Chen, X. (2016). Improved techniques for training gans. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
Seghouane, A.-K. and Amari, S.-I. (2007). The aic criterion and symmetrizing the kullback–leibler divergence. IEEE Transactions on Neural Networks, 18(1):97–106.
Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. (2017). Fake news detection on social media: A data mining perspective. SIGKDD Explor. Newsl., 19(1):22–36.
Suroso, D., Cherntanomwong, P., and Sooraksa, P. (2023). Synthesis of a small fingerprint database through a deep generative model for indoor localisation. Elektronika Ir Elektrotechnika, 29:69–75.
Vosoughi, S., Roy, D., and Aral, S. (2018). The spread of true and false news online. Science, 359(6380):1146–1151.
Wang, W. Y. (2017). “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Barzilay, R. and Kan, M.-Y., editors, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 422–426, Vancouver, Canada. Association for Computational Linguistics.
Zhou, X. and Zafarani, R. (2020). A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Comput. Surv., 53(5).
