Survey and Qualitative Analysis of Fake News Datasets in Portuguese
Abstract
The spread of fake news on social media is an increasingly serious problem, directly influencing public opinion. Artificial intelligence algorithms are used to combat them, but their effectiveness depends on the quality of the databases. In this context, there are still a limited number of databases available in the Portuguese language. Therefore, this study conducted a survey of fake news datasets in Portuguese, focusing specifically on the Brazilian context. Among the findings, the limited number of instances in the identified datasets stands out when compared to those in English.References
Agência Brasil (2024). Quase 90% dos brasileiros admitem ter acreditado em fake news.
Barbado, R., Araque, O., and Iglesias, C. A. (2019). A framework for fake review detection in online consumer electronics retailers. Information Processing & Management, 56(4):1234–1244.
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., and Kompatsiaris, Y. (2018). Detection and visualization of misleading content on twitter. International Journal of Multimedia Information Retrieval, 7(1):71–86.
Chavarro, J. P., Carvalho, J. T., Portela, T. T., and Silva, J. C. (2023). Faketruebr: Um corpus brasileiro de notícias falsas. In Escola Regional de Banco de Dados (ERBD), pages 108–117. SBC.
D’ulizia, A., Caschera, M. C., Ferri, F., and Grifoni, P. (2021). Fake news detection: a survey of evaluation datasets. PeerJ Computer Science, 7:e518.
Farhangian, F., Cruz, R. M., and Cavalcanti, G. D. (2024). Fake news detection: Taxonomy and comparative study. Information Fusion, 103:102140.
FEBRACE (2023). Poster - soc 1845. [link].
Fiocruz (2024). Pesquisa revela dados sobre fake news relacionadas à covid-19.
Garcia, G. L., Paiola, P. H., Jodas, D. S., Sugi, L. A., and Papa, J. P. (2024). Text summarization and temporal learning models applied to portuguese fake news detection in a novel brazilian corpus dataset. In Proceedings of the 16th International Conference on Computational Processing of Portuguese, pages 86–96.
Gruppi, M., Horne, B. D., and Adalı, S. (2021). Nela-gt-2020: A large multi-labelled news dataset for the study of misinformation in news articles. arXiv preprint arXiv:2102.04567.
Irís, A. and da Silva, W. M. (2024). (des) montagem de uma fake news exibida em vídeo: A multimodalidade em enunciados de leitura. SAPIENS-Revista de divulgação Científica, 6(1).
Macedo, L. B. B., de Sousa Oliveira, I., and de Lima, L. M. (2022). Multimodalidade e fake news: investigando os significados visuais nas postagens do facebook contendo notícias falsas. Entrepalavras, 11(3):526–549.
Mitra, T. and Gilbert, E. (2015). Credbank: A large-scale social media corpus with associated credibility annotations. In Proceedings of the international AAAI conference on web and social media, volume 9, pages 258–267.
Moreno, J. and Bressan, G. (2019). Factck. br: a new dataset to study fake news. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, pages 525–527.
Revista Pesquisa FAPESP (2024). Ferramenta on-line tenta identificar fake news.
Santos, R. L., Monteiro, R. A., and Pardo, T. A. (2018). The fake. br corpus-a corpus of fake news for brazilian portuguese. In Latin American and Iberian Languages Open Corpora Forum (OpenCor), pages 1–2.
Thorne, J., Vlachos, A., Christodoulopoulos, C., and Mittal, A. (2018). Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355.
TIGRE, M. F. F. d. S. et al. (2023). Utilizando modelos de machine learning para classificar fake news de covid-19.
Villela, H. F., Corrêa, F., Ribeiro, J. S. d. A. N., Rabelo, A., and Carvalho, D. B. F. (2023). Fake news detection: a systematic literature review of machine learning algorithms and datasets. Journal on Interactive Systems, 14(1):47–58.
Wang, W. Y. (2017). ”liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648.
Yibo, Z. (2024). Desenvolvimento da interação escrita em português língua não materna: uma experiência no nível a1. 2.
Barbado, R., Araque, O., and Iglesias, C. A. (2019). A framework for fake review detection in online consumer electronics retailers. Information Processing & Management, 56(4):1234–1244.
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., and Kompatsiaris, Y. (2018). Detection and visualization of misleading content on twitter. International Journal of Multimedia Information Retrieval, 7(1):71–86.
Chavarro, J. P., Carvalho, J. T., Portela, T. T., and Silva, J. C. (2023). Faketruebr: Um corpus brasileiro de notícias falsas. In Escola Regional de Banco de Dados (ERBD), pages 108–117. SBC.
D’ulizia, A., Caschera, M. C., Ferri, F., and Grifoni, P. (2021). Fake news detection: a survey of evaluation datasets. PeerJ Computer Science, 7:e518.
Farhangian, F., Cruz, R. M., and Cavalcanti, G. D. (2024). Fake news detection: Taxonomy and comparative study. Information Fusion, 103:102140.
FEBRACE (2023). Poster - soc 1845. [link].
Fiocruz (2024). Pesquisa revela dados sobre fake news relacionadas à covid-19.
Garcia, G. L., Paiola, P. H., Jodas, D. S., Sugi, L. A., and Papa, J. P. (2024). Text summarization and temporal learning models applied to portuguese fake news detection in a novel brazilian corpus dataset. In Proceedings of the 16th International Conference on Computational Processing of Portuguese, pages 86–96.
Gruppi, M., Horne, B. D., and Adalı, S. (2021). Nela-gt-2020: A large multi-labelled news dataset for the study of misinformation in news articles. arXiv preprint arXiv:2102.04567.
Irís, A. and da Silva, W. M. (2024). (des) montagem de uma fake news exibida em vídeo: A multimodalidade em enunciados de leitura. SAPIENS-Revista de divulgação Científica, 6(1).
Macedo, L. B. B., de Sousa Oliveira, I., and de Lima, L. M. (2022). Multimodalidade e fake news: investigando os significados visuais nas postagens do facebook contendo notícias falsas. Entrepalavras, 11(3):526–549.
Mitra, T. and Gilbert, E. (2015). Credbank: A large-scale social media corpus with associated credibility annotations. In Proceedings of the international AAAI conference on web and social media, volume 9, pages 258–267.
Moreno, J. and Bressan, G. (2019). Factck. br: a new dataset to study fake news. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, pages 525–527.
Revista Pesquisa FAPESP (2024). Ferramenta on-line tenta identificar fake news.
Santos, R. L., Monteiro, R. A., and Pardo, T. A. (2018). The fake. br corpus-a corpus of fake news for brazilian portuguese. In Latin American and Iberian Languages Open Corpora Forum (OpenCor), pages 1–2.
Thorne, J., Vlachos, A., Christodoulopoulos, C., and Mittal, A. (2018). Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355.
TIGRE, M. F. F. d. S. et al. (2023). Utilizando modelos de machine learning para classificar fake news de covid-19.
Villela, H. F., Corrêa, F., Ribeiro, J. S. d. A. N., Rabelo, A., and Carvalho, D. B. F. (2023). Fake news detection: a systematic literature review of machine learning algorithms and datasets. Journal on Interactive Systems, 14(1):47–58.
Wang, W. Y. (2017). ”liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648.
Yibo, Z. (2024). Desenvolvimento da interação escrita em português língua não materna: uma experiência no nível a1. 2.
Published
2025-07-20
How to Cite
BARACHO, Juliana Karla de C. M.; LISBOA, Lucas A.; LOPES, Roberta Vilhena V..
Survey and Qualitative Analysis of Fake News Datasets in Portuguese. In: WORKSHOP ON THE IMPLICATIONS OF COMPUTING IN SOCIETY (WICS), 6. , 2025, Maceió/AL.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 169-180.
ISSN 2763-8707.
DOI: https://doi.org/10.5753/wics.2025.9464.
