COVID19.BR: A Dataset of Misinformation about COVID-19 in Brazilian Portuguese WhatsApp Messages

Antônio Diogo Forte Martins; Lucas Cabral; Pedro Jorge Chaves Mourão; Ivandro Claudino de Sá; José Maria Monteiro; Javam Machado

doi:10.5753/dsw.2021.17422

Antônio Diogo Forte Martins Federal University of Ceará (UFC)
Lucas Cabral Federal University of Ceará (UFC)
Pedro Jorge Chaves Mourão State University of Ceará (UECE)
Ivandro Claudino de Sá Federal University of Ceará (UFC)
José Maria Monteiro Federal University of Ceará (UFC)
Javam Machado Federal University of Ceará (UFC)

DOI: https://doi.org/10.5753/dsw.2021.17422

Abstract

Nowadays, our society suffers with a major issue that unfortunately is becoming more and more problematic, once again through social networks, that is the misinformation. The primary source of misinformation in Brazil is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few misinformation data sets built specifically from this platform. In this context, building a data set of WhatsApp messages about COVID-19 in Brazilian Portuguese and label misinformation messages within it becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled.

Keywords: COVID19, Coronavirus, Misinformation, Whatsapp

References

Cabral, L., Monteiro, J. M., da Silva, J. W. F., Mattos, C. L. C., and Mourão, P. J. C.(2021). Fakewhastapp.br: NLP and machine learning techniques for misinformation detection in brazilian portuguese whatsapp messages. In Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, Online Streaming, April 26-28, 2021, Volume 1, pages 63–74. SCITEPRESS.

de Sá, I. C., Monteiro, J. M., da Silva, J. W. F., Medeiros, L. M., Mourão, P. J. C.,and da Cunha, L. C. C. (2021). Digital lighthouse: A platform for monitoring public groups in whatsapp. In Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, Online Streaming, April 26-28, 2021, Volume 1,pages 297–304. SCITEPRESS.

Gaglani, J., Gandhi, Y., Gogate, S., and Halbe, A. (2020). Unsupervised whatsapp fakenews detection using semantic search. In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pages 285–289. IEEE.

Garimella, K. and Tyson, G. (2018). Whatsapp, doc? a first look at whatsapp public groupdata.arXiv preprint arXiv:1804.01473.

Guo, B., Ding, Y., Yao, L., Liang, Y., and Yu, Z. (2019). The future of misinformation detection: New perspectives and trends.

Machado, C., Kira, B., Narayanan, V., Kollanyi, B., and Howard, P. (2019). A studyof misinformation in whatsapp groups with a focus on the brazilian presidential elections. WWW ’19, page 1013–1019, New York, NY, USA. Association for Computing Machinery.

Martins, A. D. F., Cabral, L., Chaves Mourão, P. J., Monteiro, J. M., and Machado, J.(2021). Detection of misinformation about covid-19 in brazilian portuguese whatsappmessages. In Natural Language Processing and Information Systems, pages 199–206,Cham. Springer International Publishing.

Qiu, X., Oliveira, D. F., Shirazi, A. S., Flammini, A., and Menczer, F. (2017). Limitedindividual attention and online virality of low-quality information.Nature HumanBehaviour, 1(7):0132.

Resende, G., Melo, P., Sousa, H., Messias, J., Vasconcelos, M., Almeida, J., and Ben-evenuto, F. (2019). (mis)information dissemination in whatsapp: Gathering, analyzingand countermeasures.

Resende, G., Messias, J., Silva, M., Almeida, J., Vasconcelos, M., and Benevenuto, F.(2018). A system for monitoring public political groups in whatsapp. In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web, WebMedia ’18, page387–390, New York, NY, USA. Association for Computing Machinery.

Rubin, V. L., Chen, Y., and Conroy, N. K. (2015). Deception detection for news: threetypes of fakes. Proceedings of the Association for Information Science and Technology,52(1):1–4.

Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automati-cally filtering fake news in portuguese. Expert Systems with Applications, 146:113199.

Su, Q., Wan, M., Liu, X., and Huang, C.-R. (2020). Motivations, methods and metricsof misinformation detection: An nlp perspective. Natural Language Processing Research, 1:1–13.

Vosoughi, S., Roy, D., and Aral, S. (2018). The spread of true and false news online.Science, 359:1146–1151.