Identificação de “Fake News” no contexto político brasileiro: uma abordagem computacional

Laura D. de Almeida; Victor Fuzaro; Falmer V. Nieto; André L. M. Santana

doi:10.5753/wics.2021.15966

Laura D. de Almeida UAM http://orcid.org/0000-0003-4399-495X
Victor Fuzaro UAM https://orcid.org/0000-0002-2409-9928
Falmer V. Nieto UAM https://orcid.org/0000-0002-0039-2265
André L. M. Santana UAM / USP https://orcid.org/0000-0002-9807-3253

DOI: https://doi.org/10.5753/wics.2021.15966

Resumo

Este artigo apresenta os principais resultados de uma solução computacional para analisar as notícias falsas brasileiras em um contexto político, e investigar qual algoritmo de aprendizado de máquina, entre Support Vector Machine e Naive Bayes, atinge o melhor resultado para classificar, em um contexto de linguagem natural, se uma notícia política é falsa ou não. O melhor desempenho foi alcançado pela combinação de SVM (RBF) + BOW com 80,4% de precisão, 82% de precisão, 76% de recuperação, 78% de F1-Score e 88% de AUC. Os algoritmos não probabilísticos se mostraram melhores na classificação de notícias falsas, sugerindo um caminho para trabalhos futuros nesta área de pesquisa.

Palavras-chave: Fake News, Aprendizado de máquina, Processamento de Linguagem Natural

Referências

Almeida, L., Fuzaro, V., Santana, A. L. M. & Venancio, F. (2020). Dataset Fake News. Repositório Github. Disponível em: https://github.com/victorfuzaro/artigofakenews

Abdin, L. (2019). Bots and fake news: the role of WhatsApp in the 2018 Brazilian Presidential election. Casey Robertson, 41(1).

Adriani, R. (2019). Fake News in the Corporate World: A Rising Threat. European Journal of Social Science Education and Research, 6(1), 92-110.

Bharadwaj, P., & Shao, Z. (2019). Fake news detection with semantic features and text mining. International Journal on Natural Language Computing (IJNLC) Vol, 8.

Bondielli, A., & Marcelloni, F. (2019). A survey on fake news and rumour detection techniques. Information Sciences, 497, 38-55.

Bovet, A., & Makse, H. A. (2019). Influence of fake news in Twitter during the 2016 US presidential election. Nature communications, 10(1), 1-14.

Davis, J., & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (pp. 233-240).

Dias, C. R. M. (2019). Towards fake news detection in Portuguese: New dataset and a claim-based approach for automated detection.

El Naqa, I., & Murphy, M. J. (2015). What is machine learning?. In machine learning in radiation oncology (pp. 3-11). Springer, Cham.

Ghosh, S., & Gunning, D. (2019). Natural Language Processing Fundamentals: Build intelligent applications that can interpret the human language to deliver impactful results. Packt Publishing Ltd.

Granik, M., & Mesyura, V. (2017, May). Fake news detection using naive Bayes classifier. In 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON) (pp. 900-903). IEEE.

Halimu, C., Kasem, A., & Newaz, S. S. (2019). Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. In Proceedings of the 3rd international conference on machine learning and soft computing (pp. 1-6).

Harrison, Matt. (2019). Machine Learning Poket Reference. O'Relly Media, Inc. ISBN 9781492047544

Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.

Jivani, A. G. (2011). A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl, 2(6), 1930-1938.

Klein, D., & Wueller, J. (2017). Fake news: A legal perspective. Journal of Internet Law.

Lane, Hobson. Howard, Cole. Hapke, Hannes. (2019). Natural Language Processing in Action: Understanding, analyzing, and generatint text with Python. Manning. ISBN 9781617294631

Lorena, A. C., Gama, J., & Faceli, K. (2000). Inteligência Artificial: Uma abordagem de aprendizado de máquina. Grupo Gen-LTC.

Monteiro, R. A., Santos, R. L., Pardo, T. A., De Almeida, T. A., Ruiz, E. E., & Vale, O. A. (2018). Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In International Conference on Computational Processing of the Portuguese Language (pp. 324-334). Springer, Cham.

Rodriguez, J. D., Perez, A., & Lozano, J. A. (2009). Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE transactions on pattern analysis and machine intelligence, 32(3), 569-575.

Rubin, V. L., Chen, Y., & Conroy, N. K. (2015). Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1), 1-4.

Sharma, K., Seo, S., Meng, C., Rambhatla, S., Dua, A., & Liu, Y. (2020). Coronavirus on social media: Analyzing misinformation in Twitter conversations. arXiv preprint arXiv:2003.12309.

Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.

Tandoc Jr, E. C., Lim, Z. W., & Ling, R. (2018). Defining “fake news” A typology of scholarly definitions. Digital journalism, 6(2), 137-153.

Vajjala, S., Majumder, B., Gupta, A., & Surana, H. (2020). Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems. O'Reilly Media.

Vascon, L. F. C., & de Souza, L. A. F. (2019). A violência policial em páginas de redes sociais virtuais: impactos das notícias falsas na opinião pública. Complexitas–Revista de Filosofia Temática, 3(1), 16-27.

Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151.

Zhou, X., & Zafarani, R. (2020). A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5), 1-40.