Fake News Classification in Portuguese Using Transformer-Based Models
Abstract
The quick growth of internet and social media usage has contributed to the widespread dissemination of the so-called Fake News. The alarming proportions this phenomenon has reached suggests the existence of a gap in the fight against misinformation. This study aims to employ classification models based on the Transformer neural network architecture for the task of fake news classification in texts written in Portuguese. Therefore, three distinct models were developed: (1) Encoder-Only, (2) Decoder-Only, and (3) Transformer (Encoder-Decoder); all trained on the same dataset obtained by merging two corpora. Additionally, some pre-trained models were analyzed and their results compared with those of the proposed models. In summary, all developed Transformer models demonstrated superior performance, with particular emphasis on the Encoder-Only model, which achieved accuracy and precision values exceeding 96.7%.
References
Cantarino, F. H. S. (2024). Criação de um corpus português para auxiliar a identificação de notícias verdadeiras e falsas. Trabalho de Conclusão de Curso (Graduação em Sistemas de Informação) – Universidade Federal de Uberlândia.
Carmo, D., Piau, M., Campiotti, I., Nogueira, R., and Lotufo, R. (2020). Ptt5: Pretraining and validating the t5 model on brazilian portuguese data. arXiv preprint arXiv:2008.09144.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
G1 Fato ou Fake (2025). Fato ou fake o serviço de checagem de fatos do grupo globo. [link]. Acesso em: 7 jun. 2025.
Garrido-Merchan, E. C., Gozalo-Brizuela, R., and Gonzalez-Carvajal, S. (2023). Comparing bert against traditional machine learning models in text classification. Journal of Computational and Cognitive Engineering, 2(4):352–356.
IBGE (2023). Pesquisa nacional por amostra de domicílios contínua.
Khyani, D. and B S, S. (2021). An interpretation of lemmatization and stemming in natural language processing. Shanghai Ligong Daxue Xuebao/Journal of University of Shanghai for Science and Technology, 22:350–357.
Monteiro, R. A., Santos, R. L. S., Pardo, T. A. S., de Almeida, T. A., Ruiz, E. E. S., and Vale, O. A. (2018). Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In Computational Processing of the Portuguese Language, pages 324–334. Springer International Publishing.
Narde, W., Mendanha, J., Barbosa, H., Coelho, F., Santos, B., and Torres, L. (2024). Classificação de notícias em português utilizando modelos baseados em transferência de aprendizagem e transformers. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 212–216, Porto Alegre, RS, Brasil. SBC.
Pires, V. and e Silva, D. G. (2024). Portuguese fake news classification with bert models. In Anais do XXI Encontro Nacional de Inteligência Artificial e Computacional, pages 834–845, Porto Alegre, RS, Brasil. SBC.
Poynter (2022). A global study on information literacy.
Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving language understanding by generative pre-training.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2020). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.
Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear).
Sun, C., Qiu, X., Xu, Y., and Huang, X. (2019). How to fine-tune bert for text classification? In Sun, M., Huang, X., Ji, H., Liu, Z., and Liu, Y., editors, Chinese Computational Linguistics, pages 194–206, Cham. Springer International Publishing.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
