Caracterização e Predição de Usuários Tóxicos no Twitter/X durante as Eleições Brasileiras de 2022

Samuel Lopes Pinto; José Julio Campolina; João Pedro M. Sena; Gabriel Félix; Lucas N. Ferreira; Julio C. S. Reis

doi:10.5753/brasnam.2024.2515

Samuel Lopes Pinto UFV
José Julio Campolina UFV
João Pedro M. Sena UFV
Gabriel Félix UFV
Lucas N. Ferreira UFV
Julio C. S. Reis UFV

DOI: https://doi.org/10.5753/brasnam.2024.2515

Abstract

With the emergence of smartphones, social platforms have become widely popular due to their ease of use. These platforms provide a conducive environment for communication between people on various topics. Especially in the political context, these platforms have been widely used to carry out virtual electoral campaigns and disseminate illicit content, including hate speech. In this context, computational solutions can be useful for early identification of this type of message. We explored posts from Twitter/X users to propose an approach that uses a pre-trained BERT model for Brazilian Portuguese (BERTimbau), to identify potentially toxic users considering the Brazilian political context. Our best results highlight that it is possible to achieve around 85% in terms of F1 score in the task of identifying a potentially toxic users. Therefore, in addition to contributing to the understanding of the characteristics of toxic speech on Twitter/X, this study highlights the potential of machine learning approaches to identify users with inappropriate behavior in the online environment, which can be useful to mitigate the impact caused by propagation of this type of content in these environments. Warning! This paper contains offensive words and tweet examples.

References

Al-Hassan, A. and Al-Dossari, H. (2019). Detection of hate speech in social networks: a survey on multilingual corpus. In International Conference on Computer Science and Information Technology, volume 10, pages 10–5121.

Almerekhi, H., Kwak, H., Salminen, J., and Jansen, B. J. (2020). Are these comments triggering? predicting triggers of toxicity in online discussions. In Proceedings of The Web Conference (WWW), page 3033–3040.

An, J., Kwak, H., Lee, C. S., Jun, B., and Ahn, Y.-Y. (2021). Predicting anti-asian hateful users on twitter during covid-19. In Findings of the Association for Computational Linguistics (EMNLP), page 4655–4666.

Araujo, M. M., Ferreira, C. H., Reis, J. C., Silva, A. P., and Almeida, J. M. (2023). Identificação e caracterização de campanhas de propagandas eleitorais antecipadas brasileiras no twitter. In Anais do Brazilian Workshop on Social Network Analysis and Mining (BrasNAM), pages 67–78.

Baeza-Yates, R., Ribeiro-Neto, B., et al. (1999). Modern information retrieval, volume 463. ACM press New York.

Christhie, W., Reis, J. C. S., Moro, M. M., Benevenuto, F., and Almeida, V. (2018). Detecção de posicionamento em tweets sobre política no contexto brasileiro. In Anais do Brazilian Workshop on Social Network Analysis and Mining (BrasNAM).

Conover, M., Ratkiewicz, J., Francisco, M., Gonçalves, B., Menczer, F., and Flammini, A. (2011). Political polarization on twitter. In Proc. of the Int’l Conference on Web and Social Media, pages 89–96.

da Fonseca, L. G. G., Ferreira, C. H., and Reis, J. C. S. (2024). The role of news source certification in shaping tweet content: Textual and dissemination patterns in brazil’s 2022 elections. In Simpósio Brasileiro de Sistemas de Informação (SBSI).

Davidson, T., Warmsley, D., Macy, M., and Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1):512–515.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Guimaraes, S., Silva, M., Caetano, J., Araújo, M., Santos, J., Reis, J. C. S., Silva, A. P., Benevenuto, F., and Almeida, J. M. (2022). Análise de propagandas eleitorais antecipadas no twitter. In Anais do Brazilian Workshop on Social Network Analysis and Mining (BraSNAM).

Kertzman, R. (2020). Na guerra de fake news, quem mente mais: bolsominions ou petralhas? [link].

Lima, L., Reis, J. C., Melo, P., Murai, F., and Benevenuto, F. (2020). Characterizing (un) moderated textual data in social systems. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 430–434.

Neves, M. (2020). Direita x esquerda: 12 nomes do futebol que nunca ficaram em cima do muro!... [link].

Queiroga, L. (2022). Casos de homicídio por motivação política marcaram reta final da eleição. [link].

Reis, J. C., Melo, P., Belém, F., Murai, F., Almeida, J. M., and Benevenuto, F. (2023a). Helping fact-checkers identify fake news stories shared through images on whatsapp. In Proc. of the Brazilian Symposium on Multimedia and the Web (WebMedia), pages 159–167.

Reis, J. C., Melo, P., Silva, M., and Benevenuto, F. (2023b). Desinformação em plataformas digitais: Conceitos, abordagens tecnológicas e desafios. Jornada de Atualiação em Informática (JAI). Sociedade Brasileira de Computação (SBC).

Silva, F. and Freitas, L. (2022). Brazilian portuguese hate speech classification using bertimbau. In The International FLAIRS Conference Proceedings, volume 35.

Teixeira, M. C. and Reis, J. C. (2023). Análise do discurso de ódio em comentários de vídeos no youtube: Um estudo de caso da cpi da covid-19 no brasil. In Anais doSimpósio Brasileiro de Bancos de Dados (SBBD), pages 330–335.

Zannettou, S., Bradlyn, B., De Cristofaro, E., Kwak, H., Sirivianos, M., Stringini, G., and Blackburn, J. (2018). What is gab: A bastion of free speech or an alt-right echo chamber. In Companion Proceedings of the The Web Conference 2018, pages 1007–1014.

Characterization and Prediction of Toxic Users on Twitter/X during the 2022 Brazilian Elections

Abstract

References

Most read articles by the same author(s)