Detecção Automática de Desinformação Relacionada à Covid-19 no Brasil

João M. M. Couto; Isadora Salles; Breno Pimenta; Samuel Assis; Leandro Araújo; Julio C. S. Reis; Fabrício Benevenuto

doi:10.5753/sbbd.2022.224326

João M. M. Couto Universidade Federal de Minas Gerais (UFMG) http://orcid.org/0000-0003-1706-2497
Isadora Salles Universidade Federal de Minas Gerais (UFMG)
Breno Pimenta Universidade Federal de Minas Gerais (UFMG) https://orcid.org/0000-0002-4877-6727
Samuel Assis Universidade Federal de Minas Gerais (UFMG) https://orcid.org/0000-0001-8882-6525
Leandro Araújo Universidade Federal de Minas Gerais (UFMG)
Julio C. S. Reis Universidade Federal de Viçosa (UFV)
Fabrício Benevenuto Universidade Federal de Minas Gerais (UFMG) https://orcid.org/0000-0001-6875-6259

DOI: https://doi.org/10.5753/sbbd.2022.224326

Resumo

A disseminação de notícias falsas tem impacto em diversas áreas cruciais da governança democrática. Muitas abordagens de identificação destas noticiais tomam como base a exploração de informações capturadas depois de sua propagação nas redes. Propomos uma metodologia de detecção em estágio inicial de propagação. Efetuamos uma análise exploratória que compreende o treinamento de milhares de modelos utilizando conjuntos diversos de parâmetros e atributos textuais extraídos de notícias suspeitas. Neste processo, desenvolvemos uma base inédita de notícias falsas propagadas no Brasil relativas à Covid-19. Resultados revelam os conjuntos de atributos mais relevantes e o poder de classificadores supervisionados para este problema no Brasil.

Palavras-chave: COVID19, Fake News, Desinformação, Detecção Automática, Classificação, Aprendizado de Máquina

Referências

Breiman, L. (2001). Random forests. Machine learning, 45(1):5-32.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (2017). Classification and regression trees. Routledge.

Charles, A. C., Ruback, L., and Oliveira, J. (2022). Fakepedia corpus: A flexible fake news corpus in portuguese. In Proc. of the Int’l Conference on Computational Processing of the Portuguese Language, pages 37-45.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785-794.

Conroy, N. K., Rubin, V. L., and Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. Proc. of the Association for Information Science and Technology, pages 1-4.

Couto, J. M. M., Pimenta, B., de Araújo, I. M., Assis, S., Reis, J. C., da Silva, A. P. C., Almeida, J. M., and Benevenuto, F. (2021). Central de fatos: Um repositório de checagens de fatos. In Proc. of the Dataset Showcase Workshop (DSW), pages 128-137.

Couto, J. M. M., Reis, J. C., Cunha, Í., Araújo, L., and Benevenuto, F. (2022). Caracterizando websites de baixa credibilidade no brasil. In Anais do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pages 503-516. SBC.

Ferrara, E. (2020). What types of covid-19 conspiracies are populated by twitter bots? arXiv preprint arXiv:2004.09531.

Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning, pages 137-142.

Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., et al. (2018). The science of fake news. Science, 359(6380):1094-1096.

Martins, A. D. F., Cabral, L., Mourao, P. J. C., Monteiro, J. M., and Machado, J. (2021). Detection of misinformation about covid-19 in brazilian portuguese whatsapp messages using deep learning. In Proc. of the Brazilian Symposium on Databases (SBBD), pages 85-96.

Massarani, L. M., Leal, T., Waltz, I., and Medeiros, A. (2021). Infodemia, desinformação e vacinas: a circulação de conteúdos em redes sociais antes e depois da covid-19. Liincem Revista, 17(1):e5689.

Monteiro, R. A., Santos, R. L., Pardo, T. A., Almeida, T. A. d., Ruiz, E. E., and Vale, O. A. (2018). Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In Proc. of the Int’l Conference on Computational Processing of the Portuguese Language, pages 324-334.

Pérez-Rosas, V., Kleinberg, B., Lefevre, A., and Mihalcea, R. (2017). Automatic detection of fake news. arXiv preprint arXiv:1708.07104.

Reis, J. C., Correia, A., Murai, F., Veloso, A., and Benevenuto, F. (2019a). Explainable machine learning for fake news detection. In Proc. of the ACM Conference on Web Science, pages 17-26.

Reis, J. C., Correia, A., Murai, F., Veloso, A., and Benevenuto, F. (2019b). Supervised learning for fake news detection. IEEE Intelligent Systems, 34(2):76-81.

Reis, J. C., Melo, P., Garimella, K., Almeida, J. M., Eckles, D., and Benevenuto, F. (2020). A dataset of fact-checked images shared on whatsapp during the brazilian and indian elections. In Proc. of the Int’l AAAI Conference on Weblogs and Social Media, pages 903-908.

Reis, J. C. S., de Souza, F., Vaz de Melo, P., Prates, R., Kwak, H., and An, J. (2015). Breaking the news: First impressions matter on online news. In Proc. of the Int’l AAAI Conference on Web and Social Media, pages 357-366.

Ribeiro, F. N., Saha, K., Babaei, M., Henrique, L., Messias, J., Benevenuto, F., Oana Goga, K. P. G., and Redmiles, E. M. (2019). On microtargeting socially divisive ads: A case study of russia-linked ad campaigns on facebook. In Proc. of the ACM Conference on Fairness, Accountability, and Transparency.

Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1):22-36.

Tausczik, Y. R. and Pennebaker, J. W. (2010). The psychological meaning of words: Liwc and computerized text analysis methods. Journal of Language and Social Psychology, 29(1):24-54.

Vargas, F., D'Alessandro, J., Rabinovich, Z., Benevenuto, F., and Pardo, T. A. (2022). Rhetorical structure approach for online deception detection: A survey. In Proc. of the Int’l Conference on Language Resources and Evaluation, pages 357-366.

Volkova, S., Shaffer, K., Jang, J. Y., and Hodas, N. (2017). Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. In Proc. of the Annual Meeting of the Association for Computational Linguistics, pages 647-653.

Vosoughi, S., Roy, D., and Aral, S. (2018). The spread of true and false news online. Science, 359(6380):1146-1151.

White, T. E. and Rege, M. (2020). Sentiment analysis on google cloud platform. Issues in Information Systems, 21(2):221-228.