How concept drift can impair the classification of fake news

Renato M. Silva; Tiago A. Almeida

doi:10.5753/kdmile.2021.17469

Renato M. Silva FACENS / UFSCar http://orcid.org/0000-0001-6687-8981
Tiago A. Almeida UFSCar http://orcid.org/0000-0001-6943-8033

DOI: https://doi.org/10.5753/kdmile.2021.17469

Resumo

Fake news is a serious problem that can influence political choices, harm people's physical and mental health, promote treatments without scientific evidence, and even incite violence. Machine learning methods are one of the leading solutions that have been studied for filtering fake news automatically. However, most studies do not consider the dynamic nature of news, creating static models and evaluating them offline through the traditional holdout or cross-validation. These studies naively assume that news characteristics do not change over time and, therefore, the performance of offline models is preserved as time goes on. In this study, we show how concept drift can impair the classification of fake news. We aim to verify whether the conclusions obtained in studies that disregarded the dynamic nature of the news are sustained. We analyzed how the performance of methods trained in an offline fashion is affected by the news update over time, including concept drift due to impacting events like the Covid-19 pandemic and the United States presidential election. The results showed that the performance of offline models is over-optimistic. Incremental learning methods should be preferred because they can adapt to changes in textual patterns over time.

Palavras-chave: fake news, online learning, text categorization, machine learning

Referências

Almeida, T. A. and Yamakami, A. Facing the spammers: A very effective approach to avoid junk e-mails. Expert Systems with Applications 39 (7): 6557–6561, June, 2012

Alves, J. L., Weitzel, L., Quaresma, P., Cardoso, C. E., and Cunha, L. Brazilian presidential elections in the era of misinformation: A machine learning approach to analyse fake news. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, I. Nyström, Y. Hernández Heredia, and V. Milián Núñez (Eds.). Springer International Publishing, Cham, pp. 72–84, 2019.

Biesialska, M., Biesialska, K., and Costa-jussà, M. R. Continual lifelong learning in natural language processing: A survey. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), pp. 6523–6541, 2020.

Bittencourt, M. M., Silva, R. M., and Almeida, T. A. ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning. Applied Soft Computing vol. 96, pp. 106699, Nov., 2020.

Cormack, G. V. Trec 2007 spam track overview. In Proceedings of the Sixteenth Text REtrieval Conference (TREC’2007). Gaithersburg, MD, USA, pp. 1–9, 2007.

Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. Online passive-aggressive algorithms. Journal of Machine Learning Research vol. 7, pp. 551–585, Dec., 2006.

Faustini, P. and Ferreira Covões, T. Fake news detection using one-class classification. In Proceedings of the 8th Brazilian Conference on Intelligent Systems (BRACIS’19). IEEE, Salvador, BA, Brazil, pp. 592–597, 2019.

Galhardi, C. P., Freire, N. P., Minayo, M. C. d. S., and Fagundes, M. C. M. Fato ou fake? uma análise da desinformação frente à pandemia da COVID-19 no Brasil. Ciência & Saúde Coletiva vol. 25, pp. 4201 – 4210, 10, 2020.

Gama, J., Sebastião, R., and Rodrigues, P. P. On evaluating stream learning algorithms. Machine Learning 90 (3): 317–346, Mar., 2013.

Ghosh, S. and Shah, C. Towards automatic fake news classification. Proceedings of the Association for Information Science and Technology 55 (1): 805–807, 2018.

Gruppi, M., Horne, B. D., and Adali, S. NELA-GT-2019: A large multi-labelled news dataset for the study of misinformation in news articles. CoRR vol. abs/2003.08444, 2020.

Gruppi, M., Horne, B. D., and Adali, S. NELA-GT-2020: A large multi-labelled news dataset for the study of misinformation in news articles. CoRR vol. abs/2102.04567, 2021.

Horne, B. D., Nørregaard, J., and Adali, S. Robust fake news detection over time and attack. ACM Transactions on Intelligent Systems and Technology 11 (1), Dec., 2019.

Kaliyar, R. K., Goswami, A., and Narang, P. Multiclass fake news detection using ensemble machine learning. In 2019 IEEE 9th International Conference on Advanced Computing (IACC). pp. 103–107, 2019.

Kaliyar, R. K., Goswami, A., Narang, P., and Sinha, S. FNDNet – a deep convolutional neural network for fake news detection. Cognitive Systems Research vol. 61, pp. 32–44, 2020.

Khan, J. Y., Khondaker, M. T. I., Afroz, S., Uddin, G., and Iqbal, A. A benchmark study of machine learning models for online fake news detection. Machine Learning with Applications vol. 4, pp. 100032, 2021.

Ksieniewicz, P., Zyblewski, P., Choraś, M., Kozik, R., Giełczyk, A., and Woźniak, M. Fake news detection from data streams. In 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–8, 2020.

Monteiro, R. A., Santos, R. L. S., Pardo, T. A. S., de Almeida, T. A., Ruiz, E. E. S., and Vale, O. A. Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In 13th International Conference on Computational Processing of the Portuguese Language (PROPOR’2018). Springer International Publishing, Canela, Rio Grande do Sul, Brazil, pp. 324–334, 2018.

Pérez-Rosas, V., Kleinberg, B., Lefevre, A., and Mihalcea, R. Automatic detection of fake news. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3391–3401, 2018.

Rasool, T., Butt, W. H., Shaukat, A., and Akram, M. U. Multi-label fake news detection using multi-layered supervised learning. In Proceedings of the 2019 11th International Conference on Computer and Automation Engineering. ICCAE 2019. Association for Computing Machinery, New York, NY, USA, pp. 73–77, 2019.

Salton, G. and Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5): 513–523, Aug., 1988.

Salvi, C., Iannello, P., Cancer, A., McClay, M., Rago, S., Dunsmoor, J. E., and Antonietti, A. Going viral: How fear, socio-cognitive polarization and problem-solving influence fake news detection and proliferation during covid-19 pandemic. Frontiers in Communication vol. 5, pp. 127, 2021.

Silva, R. M., Almeida, T. A., and Yamakami, A. MDLText: An efficient and lightweight text classifier. Knowledge-Based Systems vol. 118, pp. 152–164, Feb., 2017.

Silva, R. M., de Sales Santos, R. L., Pardo, T. A. S., and Almeida, T. A. Towards automatically filtering fake news in portuguese. Expert Systems with Applications vol. 146, pp. 1–48, May, 2020.

Song, C., Ning, N., Zhang, Y., and Wu, B. A multimodal fake news detection model based on crossmodal attention residual and multichannel convolutional neural networks. Information Processing & Management 58 (1): 102437, 2021.

Vosoughi, S., Roy, D., and Aral, S. The spread of true and false news online. Science 359 (6380): 1146–1151, 2018.

Wang, W. Y. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, pp. 422–426, 2017.

Zarocostas, J. How to fight an infodemic. The lancet 395 (10225): 676, 2020.

Zhang, S. and Kejriwal, M. Concept drift in bias and sensationalism detection: An experimental study. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ASONAM’19. Association for Computing Machinery, New York, NY, USA, pp. 601–604, 2019.

Zhou, L., Burgoon, J., Twitchell, D., Qin, T., and Nunamaker Jr., J. A comparison of classification methods for predicting deception in computer-mediated communication. Journal of Management Information Systems 20 (4): 139–165, 2004.

Zhou, X., Jain, A., Phoha, V. V., and Zafarani, R. Fake news early detection: A theory-driven model. Digital Threats: Research and Practice 1 (2), June, 2020.