On the use of Large Language Models to Detect Brazilian Politics Fake News

  • Marcos P. S. Gôlo USP
  • Adriel L. V. Mori UFG
  • William G. Oliveira UFG
  • Jacson R. Barbosa UFG
  • Valdemar V. Graciano-Neto UFG
  • Eliomar A. de Lima UFG
  • Ricardo M. Marcacini USP

Resumo


Machine learning methods are proposed to mitigate the spread of fake Brazilian news about politics so as not to harm society. Supervised algorithms are explored, requiring labeled news to train. However, labeling a high volume of news can be complex, onerous, time-consuming, error-prone, and costly. Hence, large language models (LLMs) have been used to detect fake news once LLMs are unsupervised methods that can play the role of classifiers. Most fake news detection studies explore the OpenAI LLMs (require payment) and lack an empirical evaluation with other LLMs. However, several open-source models obtain comparative and state-of-the-art (SOTA) results. We highlight that these models have yet to be explored in detecting fake Brazilian news about politics, which is crucial as it directly impacts society. In this sense, we propose a new dataset for detecting fake Brazilian news about politics and an empirical evaluation of open-source LLMs and OpenAI LLMs. In our results, the LLM from Google (Gemma) outperformed the other six LLMs, including GPT-4, proving to be the most promising model for detecting fake news about Brazilian politics.
Palavras-chave: Large Language Models, Fake Politics News, Fake Brazilian News

Referências

Alibaba (2023). Qwen technical report.

Benny, J. J. (2023). Knowledge Informed Fake News Detection Using Large Language Models. PhD thesis, University of Windsor (Canada).

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., et al. (2023). A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology.

Gemma (2024). Gemma 2: Improving open language models at a practical size.

Gôlo, M. P. S., de Souza, M. C., Rossi, R. G., Rezende, S. O., Nogueira, B. M., and Marcacini, R. M. (2023). One-class learning for fake news detection through multimodal variational autoencoders. Engineering Applications of Artificial Intelligence.

Hu, B., Sheng, Q., Cao, J., Shi, Y., Li, Y., Wang, D., and Qi, P. (2024). Bad actor, good advisor: Exploring the role of large language models in fake news detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38.

Junkert, F. G. (2022). Fake news and the 2018 brazilian presidential election. In The Rule of Law in Cyberspace, pages 167–185. Springer.

Köhler, J., Shahi, G. K., Struß, J. M., Wiegand, M., Siegel, M., and Mandl, T. (2022). Overview of the CLEF-2022 CheckThat! lab task 3 on fake news detection. In Working Notes of CLEF 2022—Conference and Labs of the Evaluation Forum.

Li, X., Zhang, Y., and Malthouse, E. C. (2024). Large language model agent for fake news detection. arXiv preprint arXiv:2405.01593.

Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., and Tang, J. (2023). Gpt understands, too. AI Open.

Meta (2024). The llama 3 herd of models.

Microsoft (2024). Phi-3 technical report: A highly capable language model locally on your phone.

Mishra, S., Shukla, P., and Agarwal, R. (2022). Analyzing machine learning enabled fake news detection techniques for diversified datasets. Wireless Communications and Mobile Computing, 2022(1):1575365.

Nan, Q., Cao, J., Zhu, Y., Wang, Y., and Li, J. (2021). Mdfend: Multi-domain fake news detection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 3343–3347.

OpenAI (2024). Gpt-4 technical report.

Pelrine, K., Reksoprodjo, M., Gupta, C., Christoph, J., and Rabbany, R. (2023). Towards reliable misinformation mitigation: Generalization, uncertainty, and gpt-4. arXiv.

Qu, Z., Meng, Y., Muhammad, G., and Tiwari, P. (2024). Qmfnd: A quantum multimodal fusion-based fake news detection model for social media. Information Fusion.

Rohera, D., Shethna, H., Patel, K., Thakker, U., Tanwar, S., Gupta, R., Hong, W.-C., and Sharma, R. (2022). A taxonomy of fake news classification techniques: Survey and implementation aspects. IEEE Access, 10:30367–30394.

Roumeliotis, K. I., Tselikas, N. D., and Nasiopoulos, D. K. (2024). Llms in e-commerce: a comparative analysis of gpt and llama models in product review evaluation. Natural Language Processing Journal, 6:100056.

Santos, R. L. d. S. (2022). Detecção automática de notícias falsas em português. PhD thesis, Universidade de São Paulo.

Shu, K., Mahudeswaran, D., Wang, S., Lee, D., and Liu, H. (2020). Fakenewsnet: A data repository with news content, social context and spatiotemporal information for studying fake news on social media. Big data, 8:171–188.

Souza, M. C. d. (2023). Detecção de notícias falsas usando poucos dados positivos rotulados. PhD thesis, Universidade de São Paulo.

Teo, T. W., Chua, H. N., Jasser, M. B., and Wong, R. T. (2024). Integrating large language models and machine learning for fake news detection. In 2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA), pages 102–107. IEEE.

Van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of machine learning research, 9(11):2579–2605.

Wang, W. Y. (2017). ”liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Annual Meeting of the ACL, pages 422–426.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223.
Publicado
17/11/2024
GÔLO, Marcos P. S.; MORI, Adriel L. V.; OLIVEIRA, William G.; BARBOSA, Jacson R.; GRACIANO-NETO, Valdemar V.; LIMA, Eliomar A. de; MARCACINI, Ricardo M.. On the use of Large Language Models to Detect Brazilian Politics Fake News. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 1-12. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2024.245119.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 > >>