Detecção de desinformação sobre Covid-19 no Twitter
Resumo
Os danos causados por notícias falsas ou enganosas têm se potencializado graças à facilidade com que as informações são disseminadas em redes sociais. Durante a pandemia do Covid-19, iniciada em 2020, tais notícias foram capazes de gerar pânico na população, além de instruir erroneamente as pessoas sobre a prevenção da doença. O presente trabalho introduz um novo corpus a partir de postagens no Twitter na língua portuguesa com desinformações sobre a Covid-191. Além do novo corpus, o trabalho avalia diferentes abordagens de representações textuais e algoritmos de aprendizagem na tarefa de detecção de mensagens contendo desinformação. O melhor resultado obtido alcançou F1-score de 89% no modelo de classificação SVM com a representação textual TF-IDF.
Palavras-chave:
Processamento de Linguagem Natural, Desinformação, Covid-19
Referências
Buntain, C. and Golbeck, J. (2017). Automatically identifying fake news in popular twitter threads. In 2017 IEEE International Conference on Smart Cloud (SmartCloud), pages 208–215. IEEE.
Cabral, L., Monteiro, J. M., da Silva, J. W. F., Mattos, C. L. C., and Mourao, P. J. C. (2021). FakeWhastApp.BR: NLP and machine learning techniques for misinformation detection in brazilian portuguese whatsapp messages. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,, pages 63–74. INSTICC, SciTePress.
Confessore, N. (2018). Cambridge analytica and facebook: The scandal and the fallout so far. [link]. Acessado em : 20/07/2021.
Cordeiro, P. R. and Pinheiro, V. (2019). Um corpus de notícias falsas do twitter e verificação automática de rumores em lingua portuguesa. In STIL-Brazilian Symposium in Information and Human Language Technology. IEEE, Salvaldor, BA, Brazil, pages 220–228.
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025.
Kemp, S. (2021). Digital 2021: the latest insights into the ’state of digital’. [link]. Acessado em : 20/07/2021.
Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., et al. (2018). The science of fake news. Science, 359(6380):1094–1096.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
Mitra, T. and Gilbert, E. (2015). Credbank: A large-scale social media corpus with associated credibility annotations. In Ninth international AAAI conference on web and social media.
Monteiro, R. A., Santos, R. L., Pardo, T. A., De Almeida, T. A., Ruiz, E. E., and Vale, O. A. (2018). Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In International Conference on Computational Processing of the Portuguese Language, pages 324–334. Springer.
Newberry, C. (2021). 36 twitter statistics all marketers should know in 2021. https://blog.hootsuite.com/twitter-statistics/. Acessado em : 20/07/2021.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830.
Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automatically filtering fake news in portuguese. Expert Systems with Applications, 146:113199.
Zervopoulos, A., Alvanou, A. G., Bezas, K., Papamichail, A., Maragoudakis, M., and Kermanidis, K. (2020). Hong kong protests: using natural language processing for fake news detection on twitter. In IFIP International Conference on Artificial Intelligence Applications and Innovations, pages 408–419. Springer.
Zubiaga, A., Liakata, M., Procter, R.,Wong Sak Hoi, G., and Tolmie, P. (2016). Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS one, 11(3):e0150989.
Cabral, L., Monteiro, J. M., da Silva, J. W. F., Mattos, C. L. C., and Mourao, P. J. C. (2021). FakeWhastApp.BR: NLP and machine learning techniques for misinformation detection in brazilian portuguese whatsapp messages. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,, pages 63–74. INSTICC, SciTePress.
Confessore, N. (2018). Cambridge analytica and facebook: The scandal and the fallout so far. [link]. Acessado em : 20/07/2021.
Cordeiro, P. R. and Pinheiro, V. (2019). Um corpus de notícias falsas do twitter e verificação automática de rumores em lingua portuguesa. In STIL-Brazilian Symposium in Information and Human Language Technology. IEEE, Salvaldor, BA, Brazil, pages 220–228.
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025.
Kemp, S. (2021). Digital 2021: the latest insights into the ’state of digital’. [link]. Acessado em : 20/07/2021.
Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., et al. (2018). The science of fake news. Science, 359(6380):1094–1096.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
Mitra, T. and Gilbert, E. (2015). Credbank: A large-scale social media corpus with associated credibility annotations. In Ninth international AAAI conference on web and social media.
Monteiro, R. A., Santos, R. L., Pardo, T. A., De Almeida, T. A., Ruiz, E. E., and Vale, O. A. (2018). Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In International Conference on Computational Processing of the Portuguese Language, pages 324–334. Springer.
Newberry, C. (2021). 36 twitter statistics all marketers should know in 2021. https://blog.hootsuite.com/twitter-statistics/. Acessado em : 20/07/2021.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830.
Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automatically filtering fake news in portuguese. Expert Systems with Applications, 146:113199.
Zervopoulos, A., Alvanou, A. G., Bezas, K., Papamichail, A., Maragoudakis, M., and Kermanidis, K. (2020). Hong kong protests: using natural language processing for fake news detection on twitter. In IFIP International Conference on Artificial Intelligence Applications and Innovations, pages 408–419. Springer.
Zubiaga, A., Liakata, M., Procter, R.,Wong Sak Hoi, G., and Tolmie, P. (2016). Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS one, 11(3):e0150989.
Publicado
29/11/2021
Como Citar
MOTA, Ana Alice Ximenes; FRANCO, Wellington; MATTOS, César Lincoln Cavalcante.
Detecção de desinformação sobre Covid-19 no Twitter. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 13. , 2021, Evento Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 172-181.
DOI: https://doi.org/10.5753/stil.2021.17796.