Fake News Detection about Covid-19 in the Portuguese Language

  • Anísio Pereira Batista Filho UNIVASF
  • Débora da Conceição Araújo UNIVASF / UFPE
  • Máverick André Dionísio Ferreira UFPE
  • Paulo Salgado Gomes de Mattos Neto UFPE

Resumo


A disseminação de notícias falsas tem sido um problema notado em diversos setores da sociedade, e vem dificultando o combate à pandemia causada pelo novo coronavírus (Sars-Cov-2). Combater desinformação sobre o Sars-Cov-2, principalmente nas redes sociais, é de fundamental importância para o controle da propagação do vírus e, consequentemente, da pandemia. Diante disso, nesse trabalho são construídos modelos de aprendizado supervisionado focados na identificação de notícias falsas sobre o novo coronavírus. Como resultados, foram construídos e avaliados 18 modelos, os quais chegaram a alcançar 0.62%, 0.82% e 0.47% de f-score para as classes consideradas (news, opinion e fake).

Referências

Ajao, O., Bhowmik, D., and Zargari, S. (2018). Fake news identification on twitter with In Proceedings of the 9th international conference on hybrid cnn and rnn models. social media and society, pages 226–230.

Bengio, Y. and Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of machine learning research, 5(Sep):1089–1105.

Bovet, A. and Makse, H. A. (2019). Inuence of fake news in twitter during the 2016 us presidential election. Nature communications, 10(1):1–14.

Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (2017). Classification and regression trees. Routledge.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.

Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.

Chen, Y., Zhou, B., Zhang, W., Gong, W., and Sun, G. (2018). Sentiment analysis based In 2018 on deep learning and its application in screening for perinatal depression. IEEE Third International Conference on Data Science in Cyberspace (DSC), pages 451–456. IEEE.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.

Cunha, B. A. (2004). Inuenza: historical aspects of epidemics and pandemics. Infectious Disease Clinics, 18(1):141–155.

In International Dietterich, T. G. (2000). Ensemble methods in machine learning. workshop on multiple classifier systems, pages 1–15. Springer.

Ebeling, R., Sáenz, C. A. C., Nobre, J., and Becker, K. (2020). Quarenteners vs. chloroquiners: A framework to analyze how political polarization affects the behavior of groups. In 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), pages 203–210. IEEE.

Ebeling, R., Sáenz, C. A. C., Nobre, J., and Becker, K. (2021). The effect of political polarization on social distance stances in the brazilian covid-19 scenario.

Fernández, A., Garcia, S., Herrera, F., and Chawla, N. V. (2018). Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61:863–905.

Ferri, C., Hernández-Orallo, J., and Modroiu, R. (2009). An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30(1):27–38.

Freire, P. and Goldschmidt, R. (2019). Combatendo fake news nas redes sociais via crowd signals implícitos. In Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pages 424–435. SBC.

Hastie, T., Rosset, S., Zhu, J., and Zou, H. (2009). Multi-class adaboost. Statistics and its Interface, 2(3):349–360.

Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18–28.

Kelly, H. (2011). The classical definition of a pandemic is not elusive. Bulletin of the World Health Organization, 89:540–541.

Liu, X.-Y., Wu, J., and Zhou, Z.-H. (2008). Exploratory undersampling for classIEEE Transactions on Systems, Man, and Cybernetics, Part B imbalance learning. (Cybernetics), 39(2):539–550.

Lo, R. T.-W., He, B., and Ounis, I. (2005). Automatically building a stopword list for an information retrieval system. In Journal on Digital Information Management: Special Issue on the 5th Dutch-Belgian Information Retrieval Workshop (DIR), volume 5, pages 17–24.

Ministério da Saúde (2021). Coronavírus no brasil. https://covid.saude.gov.br/, Accessed on 08/02/2021.

Monteiro, R. A., Santos, R. L., Pardo, T. A., De Almeida, T. A., Ruiz, E. E., and Vale, O. A. (2018). Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In International Conference on Computational Processing of the Portuguese Language, pages 324–334. Springer.

Plisson, J., Lavrac, N., Mladenic, D., et al. (2004). A rule based approach to word lemmatization. In Proceedings of IS, volume 3, pages 83–86.

Reis, J. C., Correia, A., Murai, F., Veloso, A., and Benevenuto, F. (2019). Supervised learning for fake news detection. IEEE Intelligent Systems, 34(2):76–81.

Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611.

Sjarif, N. N. A., Azmi, N. F. M., Chuprat, S., Sarkan, H. M., Yahya, Y., and Sam, S. M. (2019). Sms spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Computer Science, 161:509–515.

Student (1908). The probable error of a mean. Biometrika, pages 1–25.

Waszak, P. M., Kasprzycka-Waszak, W., and Kubanek, A. (2018). The spread of medical fake news in social media–the pilot quantitative study. Health policy and technology, 7(2):115–118.

Wilcoxon, F. (1992). Individual comparisons by ranking methods. In Breakthroughs in statistics, pages 196–202. Springer.

World Health Organisation (2021). Who coronavirus (covid-19) dashboard. https://covid19.who.int/, Accessed on 08/02/2021.

Zhang, H. (2004). The optimality of naive bayes. In: Association for the Advancement of Artificial Intelligence (AAAI), 1(2):3.

Zhang, X. and Ghorbani, A. A. (2020). An overview of online fake news: Characterization, detection, and discussion. Information Processing & Management, 57(2):102025.
Publicado
29/11/2021
BATISTA FILHO, Anísio Pereira; ARAÚJO, Débora da Conceição; FERREIRA, Máverick André Dionísio; MATTOS NETO, Paulo Salgado Gomes de. Fake News Detection about Covid-19 in the Portuguese Language. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 18. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 492-503. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2021.18278.

Artigos mais lidos do(s) mesmo(s) autor(es)