Detection of Misinformation about COVID-19 in Brazilian Portuguese WhatsApp Messages Using Deep Learning

Antônio Diogo Forte Martins; Lucas Cabral; Pedro Jorge Chaves Mourão; José Maria Monteiro; Javam Machado

doi:10.5753/sbbd.2021.17868

Antônio Diogo Forte Martins Universidade Federal do Ceará (UFC)
Lucas Cabral Universidade Federal do Ceará (UFC)
Pedro Jorge Chaves Mourão Universidade Estadual do Ceará (UECE)
José Maria Monteiro Universidade Federal do Ceará (UFC)
Javam Machado Universidade Federal do Ceará (UFC)

DOI: https://doi.org/10.5753/sbbd.2021.17868

Resumo

During the COVID-19 pandemic, the misinformation problem arose once again through social networks, like a harmful health advice and false solutions epidemic. In Brazil, as well as in many developing countries, one of the primary sources of misinformation is the messaging application WhatsApp. Thus, the automatic misinformation detection (MID) about COVID-19 in Brazilian Portuguese WhatsApp messages becomes a crucial challenge. Still, due to WhatsApp's private messaging nature, there are still few methods of misinformation detection developed specifically for the WhatsApp platform. In this paper, we propose a new approach, called MIDeepBR, based on BiLSTM neural networks, pooling operations and attention mechanism, which is able to automatically detect misinformation in Brazilian Portuguese WhatsApp messages. Experimental results evidence the suitability of the proposed approach to automatic misinformation detection. Our best results achieved an F1 score of 0.834, while in previous works, the best results achieved an F1 score of 0.778. Thus, MIDeepBR outperforms the previous works.

Palavras-chave: COVID19, Coronavirus, Misinformation, Whatsapp

Referências

Beltagy, I., Peters, M. E., and Cohan, A. (2020). Longformer: The long-document transformer. CoRR, abs/2004.05150.

Choudrie, J., Banerjee, S., Kotecha, K., Walambe, R., Karende, H., and Ameta, J. (2021). Machine learning techniques and older adults processing of online information and misinformation: A covid 19 study. Computers in Human Behavior, 119:106716.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. P. (2011). Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493–2537.

de Sá, I. C., Monteiro, J. M., da Silva, J. W. F., Medeiros, L. M., Mourão, P. J. C., and da Cunha, L. C. C. (2021). Digital lighthouse: A platform for monitoring public groups in whatsapp. In Filipe, J., Smialek, M., Brodsky, A., and Hammoudi, S., editors, Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, Online Streaming, April 26-28, 2021, Volume 1, pages 297–304. SCITEPRESS.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Elhadad, M. K., Li, K. F., and Gebali, F. (2020). Detecting misleading information on covid-19. IEEE Access, 8:165201–165215.

Giachanou, A., Zhang, G., and Rosso, P. (2020). Multimodal multi-image fake news detection. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pages 647–654.

Granik, M. and Mesyura, V. (2017). Fake news detection using naive bayes classifier. In 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), pages 900–903. IEEE.

Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5-6):602–610.

Guo, B., Ding, Y., Yao, L., Liang, Y., and Yu, Z. (2019). The future of misinformation detection: New perspectives and trends.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780.

Kim, S.-B., Han, K.-S., Rim, H.-C., and Myaeng, S. H. (2006). Some effective techniques for naive bayes text classification. IEEE transactions on knowledge and data engineering, 18(11):1457–1466.

Kolluri, N. L. and Murthy, D. (2021). Coverifi: A covid-19 news verification system. Online Social Networks and Media, 22:100123.

Maakoul, O., Boucht, S., El Hachimi, K., and Azzouzi, S. (2020). Towards evaluating the covid’19 related fake news problem: Case of morocco. In 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS), pages 1–6.

Martins, A. D. F., Cabral, L., Chaves Mourão, P. J., Monteiro, J. M., and Machado, J. (2021). Detection of misinformation about covid-19 in brazilian portuguese whatsapp messages. In Métais, E., Meziane, F., Horacek, H., and Kapetanios, E., editors, Natural Language Processing and Information Systems, pages 199–206, Cham. Springer International Publishing.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A., d Alché-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Prasetijo, A. B., Isnanto, R. R., Eridani, D., Soetrisno, Y. A. A., Arfan, M., and Sofwan, A. (2017). Hoax detection system on indonesian news sites based on text classification using svm and sgd. In 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), pages 45–49. IEEE.

Qiu, X., Oliveira, D. F., Shirazi, A. S., Flammini, A., and Menczer, F. (2017). Limited individual attention and online virality of low-quality information. Nature Human Behaviour, 1(7):0132.

Rennie, J. D., Shih, L., Teevan, J., and Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th international conference on machine learning (ICML-03), pages 616–623.

Resende, G., Melo, P., Sousa, H., Messias, J., Vasconcelos, M., Almeida, J., and Benevenuto, F. (2019). (mis)information dissemination in whatsapp: Gathering, analyzing and counter measures.

Rosenfeld, A., Sina, S., Sarne, D., Avidov, O., and Kraus, S. (2018). A study of whatsapp usage patterns and prediction models without message content. arXiv preprint arXiv:1802.03393.

Rubin, V. L., Chen, Y., and Conroy, N. K. (2015). Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):1–4.

Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automatically filtering fake news in portuguese. Expert Systems with Applications, 146:113199.

Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958.

Su, Q., Wan, M., Liu, X., and Huang, C.-R. (2020). Motivations, methods and metrics of misinformation detection: An nlp perspective. Natural Language Processing Research, 1:1–13.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.

Vosoughi, S., Roy, D., and Aral, S. (2018). The spread of true and false news online. Science, 359:1146–1151.

Waterloo, S. F., Baumgartner, S. E., Peter, J., and Valkenburg, P. M. (2018). Norms of online expressions of emotion: Comparing facebook, twitter, instagram, and whatsapp. new media & society, 20(5):1813–1831.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., and Brew, J. (2019). Huggingface’s transformers: State-of-the-art natural language processing. CoRR, abs/1910.03771.

Zervopoulos, A., Alvanou, A. G., Bezas, K., Papamichail, A., Maragoudakis, M., and Kermanidis, K. (2020). Hong kong protests: Using natural language processing for fake news detection on twitter. In IFIP International Conference on Artificial Intelligence Applications and Innovations, pages 408–419. Springer.

Zhuang, J., Tang, T., Ding, Y., Tatikonda, S., Dvornek, N. C., Papademetris, X., and Duncan, J. S. (2020). Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. CoRR, abs/2010.07468.