COVID19.BR: A Dataset of Misinformation about COVID-19 in Brazilian Portuguese WhatsApp Messages

  • Antônio Diogo Forte Martins Universidade Federal do Ceará (UFC)
  • Lucas Cabral Universidade Federal do Ceará (UFC)
  • Pedro Jorge Chaves Mourão Universidade Estadual do Ceará (UECE)
  • Ivandro Claudino de Sá Universidade Federal do Ceará (UFC)
  • José Maria Monteiro Universidade Federal do Ceará (UFC)
  • Javam Machado Universidade Federal do Ceará (UFC)

Resumo


Nowadays, our society suffers with a major issue that unfortunately is becoming more and more problematic, once again through social networks, that is the misinformation. The primary source of misinformation in Brazil is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few misinformation data sets built specifically from this platform. In this context, building a data set of WhatsApp messages about COVID-19 in Brazilian Portuguese and label misinformation messages within it becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled.
Palavras-chave: COVID19, Coronavirus, Misinformation, Whatsapp

Referências

Cabral, L., Monteiro, J. M., da Silva, J. W. F., Mattos, C. L. C., and Mourão, P. J. C.(2021). Fakewhastapp.br: NLP and machine learning techniques for misinformation detection in brazilian portuguese whatsapp messages. In Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, Online Streaming, April 26-28, 2021, Volume 1, pages 63–74. SCITEPRESS.

de Sá, I. C., Monteiro, J. M., da Silva, J. W. F., Medeiros, L. M., Mourão, P. J. C.,and da Cunha, L. C. C. (2021). Digital lighthouse: A platform for monitoring public groups in whatsapp. In Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, Online Streaming, April 26-28, 2021, Volume 1,pages 297–304. SCITEPRESS.

Gaglani, J., Gandhi, Y., Gogate, S., and Halbe, A. (2020). Unsupervised whatsapp fakenews detection using semantic search. In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pages 285–289. IEEE.

Garimella, K. and Tyson, G. (2018). Whatsapp, doc? a first look at whatsapp public groupdata.arXiv preprint arXiv:1804.01473.

Guo, B., Ding, Y., Yao, L., Liang, Y., and Yu, Z. (2019). The future of misinformation detection: New perspectives and trends.

Machado, C., Kira, B., Narayanan, V., Kollanyi, B., and Howard, P. (2019). A studyof misinformation in whatsapp groups with a focus on the brazilian presidential elections. WWW ’19, page 1013–1019, New York, NY, USA. Association for Computing Machinery.

Martins, A. D. F., Cabral, L., Chaves Mourão, P. J., Monteiro, J. M., and Machado, J.(2021). Detection of misinformation about covid-19 in brazilian portuguese whatsappmessages. In Natural Language Processing and Information Systems, pages 199–206,Cham. Springer International Publishing.

Qiu, X., Oliveira, D. F., Shirazi, A. S., Flammini, A., and Menczer, F. (2017). Limitedindividual attention and online virality of low-quality information.Nature HumanBehaviour, 1(7):0132.

Resende, G., Melo, P., Sousa, H., Messias, J., Vasconcelos, M., Almeida, J., and Ben-evenuto, F. (2019). (mis)information dissemination in whatsapp: Gathering, analyzingand countermeasures.

Resende, G., Messias, J., Silva, M., Almeida, J., Vasconcelos, M., and Benevenuto, F.(2018). A system for monitoring public political groups in whatsapp. In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web, WebMedia ’18, page387–390, New York, NY, USA. Association for Computing Machinery.

Rubin, V. L., Chen, Y., and Conroy, N. K. (2015). Deception detection for news: threetypes of fakes. Proceedings of the Association for Information Science and Technology,52(1):1–4.

Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automati-cally filtering fake news in portuguese. Expert Systems with Applications, 146:113199.

Su, Q., Wan, M., Liu, X., and Huang, C.-R. (2020). Motivations, methods and metricsof misinformation detection: An nlp perspective. Natural Language Processing Research, 1:1–13.

Vosoughi, S., Roy, D., and Aral, S. (2018). The spread of true and false news online.Science, 359:1146–1151.
Publicado
04/10/2021
Como Citar

Selecione um Formato
MARTINS, Antônio Diogo Forte; CABRAL, Lucas; MOURÃO, Pedro Jorge Chaves; DE SÁ, Ivandro Claudino; MONTEIRO, José Maria; MACHADO, Javam. COVID19.BR: A Dataset of Misinformation about COVID-19 in Brazilian Portuguese WhatsApp Messages. In: DATASET SHOWCASE WORKSHOP (DSW), 3. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 138-147. DOI: https://doi.org/10.5753/dsw.2021.17422.