A Moving Target: Detecting Concept Drift in Brazilian Portuguese Fake News

  • Manuela Guedes Wanderley USP
  • Lucca Baptista Silva Ferraz USP
  • Tiago Agostinho Almeida UFSCar
  • Renato Moraes Silva USP

Abstract


Static fake news detectors trained on offline data degrade over time due to concept drift, an understudied phenomenon outside English. This paper presents the first large-scale analysis of concept drift in Brazilian Portuguese fake news. Combining statistical two-sample tests, semantic similarity, and nonparametric change point detection, we quantify the presence and impact of drift, providing explainability by identifying points in time where shifts occur. Our results reveal significant shifts in topical and semantic patterns, demonstrating that model performance can degrade considerably when trained on older data. These findings prove the critical need for adaptive, time-aware methods, and the curation of temporally diverse datasets to build robust defenses against online misinformation in Brazilian Portuguese. The source code for our experiments is publicly available at https://github.com/GDSMN/STIL2025_conceptdrift.

References

Barboza, E. and de Almeida, P. (2022). Challenges on classifying data streams with concept drift. In Anais Estendidos do XXXVII Simpósio Brasileiro de Bancos de Dados, pages 126–132, Porto Alegre, RS, Brasil. SBC.

Bu, L., Alippi, C., and Zhao, D. (2018). A pdf-free change detection test based on density difference estimation. IEEE Transactions on Neural Networks and Learning Systems, 29(2):324–334.

Chavarro, J., Carvalho, J., Portela, T., and Silva, J. (2023). FakeTrueBR: Um corpus brasileiro de notícias falsas. In Anais da XVIII Escola Regional de Banco de Dados, pages 108–117, Porto Alegre, RS, Brasil. SBC.

Feldhans, R. and Hammer, B. (2025). Towards reliable drift detection and explanation in text data. In Julian, V., Camacho, D., Yin, H., Alberola, J. M., Nogueira, V. B., Novais, P., and Tallón-Ballesteros, A., editors, Intelligent Data Engineering and Automated Learning – IDEAL 2024, pages 301–312, Cham. Springer Nature Switzerland.

Feldhans, R., Wilke, A., Heindorf, S., Shaker, M. H., Hammer, B., Ngonga Ngomo, A.C., and Hüllermeier, E. (2021). Drift detection in text data with document embeddings. In Yin, H., Camacho, D., Tino, P., Allmendinger, R., Tallón-Ballesteros, A. J., Tang, K., Cho, S.-B., Novais, P., and Nascimento, S., editors, Intelligent Data Engineering and Automated Learning – IDEAL 2021, pages 107–118, Cham. Springer International Publishing.

Gama, J. a., Žliobaitundefined, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Comput. Surv., 46(4).

Garcia, C. M., Abilio, R., Koerich, A. L., Britto, A. d. S., and Barddal, J. P. (2025). Concept drift adaptation in text stream mining settings: A systematic review. ACM Transactions on Intelligent Systems and Technology, 16(2).

Garcia, G. L., Afonso, L. C., and Papa, J. P. (2022). Fakerecogna: a new brazilian corpus for fake news detection. In International Conference on Computational Processing of the Portuguese Language, pages 57–67. Springer.

Garcia, G. L., Paiola, P. H., Jodas, D. S., Sugi, L. A., and Papa, J. P. (2024). Text summarization and temporal learning models applied to Portuguese fake news detection in a novel Brazilian corpus dataset. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese Vol. 1, pages 86–96, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.

Garreau, D. and Arlot, S. (2017). Consistent change-point detection with kernels. arXiv, arXiv:1612.04740.

Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13(25):723–773.

Kusner, M. J., Sun, Y., Kolkin, N. I., and Weinberger, K. Q. (2015). From word embeddings to document distances. In Proceedings of the 32nd International Conference on International Conference on Machine Learning Volume 37, ICML’15, page 957–966. JMLR.org.

Monteiro, R. A., Santos, R. L. S., Pardo, T. A. S., de Almeida, T. A., Ruiz, E. E. S., and Vale, O. A. (2018). Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In 13th International Conference on Computational Processing of the Portuguese Language (PROPOR’2018), pages 324–334, Canela, Rio Grande do Sul, Brazil. Springer International Publishing.

Moradi, M., Rahmanimanesh, M., and Shahzadi, A. (2024). Transfer learning for concept drifting data streams in heterogeneous environments. Knowledge and Information Systems, 66(5):2799–2857.

Řehůřek, R. and Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta. ELRA.

Sarnovský, M. and Babič, F. (2025). Concept drift influenced by topic change in data streams from social media. In 2025 IEEE 23rd World Symposium on Applied Machine Intelligence and Informatics (SAMI), pages 000337–000340.

Silva, R. M. and Almeida, T. A. (2021). How concept drift can impair the classification of fake news. In Proceedings of the 9th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe’21), pages 1–8, Rio de Janeiro, RJ, Brazil. Brazilian Computing Society.

Silva, R. M., de Sales Santos, R. L., Pardo, T. A. S., and Almeida, T. A. (2020). Towards automatically filtering fake news in portuguese. Expert Systems with Applications, 146:1–48.

Silva, R. M., Pires, P. R., and Almeida, T. A. (2023). Incremental learning for fake news detection. Journal of Information and Data Management, 13(6):566–579.

Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT models for Brazilian Portuguese. In Intelligent Systems: 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, Proceedings, Part I, page 403–417.

Sugiyama, M., Kanamori, T., Suzuki, T., Plessis, M. C. d., Liu, S., and Takeuchi, I. (2013). Density-difference estimation. Neural Computation, 25(10):2734–2775.

Truong, C., Oudre, L., and Vayatis, N. (2020). Selective review of offline change point detection methods. Signal Processing, 167:107299.

Wagner Filho, J. A., Wilkens, R., Idiart, M., and Villavicencio, A. (2018). The brWaC corpus: A new open resource for brazilian portuguese. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Published
2025-09-29
WANDERLEY, Manuela Guedes; FERRAZ, Lucca Baptista Silva; ALMEIDA, Tiago Agostinho; SILVA, Renato Moraes. A Moving Target: Detecting Concept Drift in Brazilian Portuguese Fake News. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 490-501. DOI: https://doi.org/10.5753/stil.2025.37849.