Caracterização, Evolução e Identificação de Padrões em Notícias Falsas: Uma Abordagem Voltada à Modelagem de Tópicos

Leonardo Emerson André Alves; Jonice Oliveira; Sírius Silva

doi:10.5753/sbsi_estendido.2024.238687

Leonardo Emerson André Alves Universidade Federal do Rio de Janeiro http://orcid.org/0009-0007-8504-5150
Jonice Oliveira Universidade Federal do Rio de Janeiro https://orcid.org/0000-0002-2495-1463
Sírius Silva Universidade Federal do Rio de Janeiro https://orcid.org/0000-0002-8353-7422

DOI: https://doi.org/10.5753/sbsi_estendido.2024.238687

Resumo

Este estudo propõe uma metodologia capaz de realizar a caracterização, o estudo evolucional e a identificação de padrões de escrita em notícias falsas. Dessa forma, foi realizado o tratamento e aperfeiçoamento de um corpus não-balanceado. Posteriormente, foi realizada a análise das notícias com o uso de técnicas de processamento de linguagem natural e modelagem de tópicos fazendo uso de algoritmos tradicionais (LDA e LSA). Os resultados compreendem a criação de um dicionário que caracteriza os padrões de escrita presentes nas notícias falsas estudadas, bem como a comparação de eficiência entre os algoritmos utilizados por meio do uso da métrica de coerência.

Palavras-chave: Notícias falsas, análise textual, processamento de linguagem natural, web scraping, modelagem de tópicos

Referências

Alves, L. E. A. (2023). Caracterização, evolução e identificação de padrões em notícias falsas: uma abordagem voltada à modelagem de tópicos. Trabalho de Conclusão de Curso.Universidade Federal do Rio de Janeiro. Disponível em: [link]. Acessado em 06/01/2024.

Alves, L.E.A et al. (2023). Caracterização, evolução e identificação de padrões em notícias falsas via modelagem de tópicos (id: 2845). Semana de Integração Acadêmica da UFRJ (12.:2023): CCMN.

Araujo, R.M.; Maciel, R.S.; Boscarioli, C. “I GranDSI-BR: Grandes Desafios de Pesquisa em Sistemas de Informação no Brasil (2016-2026)” - Relatório Técnico. Comissão Especial de Sistemas de Informação (CE-SI) da Sociedade Brasileira de Computação (SBC). 67P, 2017. ISBN 978-85-7669-359-8.

Bastick, Z. (2021). Would you notice if fake news changed your behavior? An experiment on the unconscious effects of disinformation. Computers in Human Behavior, v. 116, p. 106633.

Charles, A., Ruback, L. and Oliveira, J. (2022). Fakepedia Corpus: A Flexible Fake News Corpus in Portuguese. International Conference on Computational Processing of the Portuguese Language (pp. 37-45). Springer International Publishing.

Colomina, C., Margalef, H. S. and Youngs, R. (2021). The impact of disinformation on democratic processes and human rights in the world. Brussels: European Parliament.

Gelfert, A. (2021). Fake News, False Beliefs, and the Fallible Art of Knowledge Maintenance. In: Bernecker, S.; Flowerree, A. K.; Grundmann, T.[Eds.]. The Epistemology of Fake News. Oxford University Press. p. 0.

Guo, B., Ding, Y., Yueheng, S., Ma, S. and Li, K. (2019). The Mass, Fake News, and Cognition Security.

May, C., Cotterell, R. and Van Durme, B. (2019). An Analysis of Lemmatization on Topic Models of Morphologically Rich Language. arXiv. Disponível em [link]. Acessado em 11/01/2024.

Monteiro, R. A., Santos, R. L. S., Pardo, T. A. S., et al. (2018). Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. [A. Villavicencio, V. Moreira, A. Abad, et al., Eds.]In Computational Processing of the Portuguese Language. , Lecture Notes in Computer Science. Springer International Publishing.

Melo, Tiago de; Figueiredo, Carlos M. S. Comparing News Articles and Tweets About COVID-19 in Brazil: Sentiment Analysis and Topic Modeling Approach. JMIR Public Health and Surveillance, v. 7, n. 2, p. e24585, 2021.

Newman, D., Chemudugunta, C., Smyth, P. and Steyvers, M. (2006). Analyzing Entities and Topics in News Articles Using Statistical Topic Models. [S. Mehrotra, D. D. Zeng, H. Chen, B. Thuraisingham, & F.-Y. Wang, Eds.]In Intelligence and Security Informatics. , Lecture Notes in Computer Science. Springer.

Newman, Matthew; Pennebaker, James; Berry, Diane; et al. Lying Words: Predicting Deception from Linguistic Styles. Personality & social psychology bulletin, v. 29, p. 665–75, 2003.

Nwankwo, E., Okolo, C., Habonimana, C. and Beach, C.-L. (2020). Topic Modeling Approaches for Understanding COVID-19 Misinformation Spread in Sub-Saharan Africa.

Pennebaker, James & King, Laura. (2000). Linguistic styles: Language use as an individual difference. Journal of personality and social psychology. 77. 1296-312. 10.1037//0022-3514.77.6.1296.

Pérez-Rosas, V., Kleinberg, B., Lefevre, A. and Mihalcea, R. (2017). Automatic Detection of Fake News. arXiv. Disponível em [link]. Acessado em 11/01/2024.

Pritzkau, A., Blanc, O., Geierhos, M. and Schade, U. (2022). NLytics at CheckThat! 2022: Hierarchical multi-class fake news detection of news articles exploiting the topic structure.

Řehůřek, R. and Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora.

Reis, J. C. S. and Benevenuto, F. (2021). Towards Automatic Fake News Detection in Digital Platforms: Properties, Limitations, and Applications. In Anais do Concurso de Teses e Dissertações (CTD). SBC. Disponível em [link]. Acessado em 11/01/2024.

Röder, M., Both, A. and Hinneburg, A. (2015). Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. , WSDM ’15. Association for Computing Machinery. DOI: 10.1145/2684822.2685324. Acessado em 11/01/2024.

Su, Q., Wan, M., Liu, X. and Huang, C.-R. (2020). Motivations, Methods and Metrics of Misinformation Detection: An NLP Perspective. Natural Language Processing Research, v. 1, n. 1–2, p. 1–13.

Vosoughi, S., Roy, D. and Aral, S. (2018). The spread of true and false news online. Science, v. 359, n. 6380, p. 1146–1151.

Zipitria, I., Arruarte, A. and Elorriaga, J. A. (2006). Observing Lemmatization Effect in LSA Coherence and Comprehension Grading of Learner Summaries. [M. Ikeda, K. D. Ashley, & T.-W. Chan, Eds.]In Intelligent Tutoring Systems. , Lecture Notes in Computer Science. Springer.