Contextual BERT Model for Toxicity Detection in Messaging Platforms

  • Arthur Buzelin UFMG
  • Yan Aquino UFMG
  • Pedro Bento UFMG
  • Lucas Dayrell UFMG
  • Victoria Estanislau UFMG
  • Samira Malaquias UFMG
  • Pedro Dutenhefner UFMG
  • Luisa G. Porfírio UFMG
  • Pedro B. Rigueira UFMG
  • Caio Souza Grossi UFMG
  • Guilherme H. G. Evangelista UFMG
  • Gisele L. Pappa UFMG
  • Wagner Meira Jr. UFMG

Resumo


The increasing prevalence of messaging platforms has created new challenges in hate speech detection. Traditional classification models designed for social media posts often fall short in these environments due to the lack of contextual information. This paper presents a novel approach to message classification by integrating contextual data from preceding messages, utilizing a fine-tuned BERT model based on PySentimiento. Our results demonstrate that incorporating preceding messages substantially improves the classification task. The average AUC-ROC increased from 0.691 with the PySentimiento base model to 0.784 with standard fine-tuning, and further to an impressive 0.926 with our context-based model.
Palavras-chave: BERT, Transformers, Toxicity detection, A.I, Fine-Tuning

Referências

Aliaksandr Herasimenka, Jonathan Bright, A. K. and Howard, P. N. (2023). Misinformation and professional news on largely unmoderated platforms: the case of telegram. Journal of Information Technology & Politics, 20(2):198–212.

Balayn, A., Yang, J., Szlavik, Z., and Bozzon, A. (2021). Automatic identification of harmful, aggressive, abusive, and offensive language on the web: A survey of technical biases informed by psychology literature. ACM Transactions on Social Computing (TSC), 4(3):1–56.

Bento, P., Buzelin, A., Aquino, Y., Carvalho, I., Dutenhefner, P., Dayrell, L., Santana, C., Estanislau, V., Pappa, G., Miranda, D., Almeida, V., and Jr, W. M. (2024). Impacto da pandemia na discussão sobre saúde mental: O caso do discord no brasil. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web, pages 179–187, Porto Alegre, RS, Brasil. SBC.

Caselli, T., Basile, V., Mitrović, J., and Granitzer, M. (2020). Hatebert: Retraining bert for abusive language detection in english. arXiv preprint arXiv:2010.12472.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.

da Costa, P., Pavan, M., dos Santos, W., da Silva, S., and Paraboni, I. (2023). Bertabaporu: Assessing a genre-specific language model for portuguese nlp. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, page 217–223, Varna, Bulgaria. INCOMA Ltd.

Dahiya, S., Mohta, A., and Jain, A. (2020). Text classification based behavioural analysis of whatsapp chats. In 2020 5th International Conference on Communication and Electronics Systems (ICCES), pages 717–724.

Hoseini, M., Melo, P., Júnior, M., Benevenuto, F., Chandrasekaran, B., Feldmann, A., and Zannettou, S. (2020). Demystifying the messaging platforms’ ecosystem through the lens of twitter. In Proceedings of the ACM Internet Measurement Conference, IMC ’20, page 345–359, New York, NY, USA. Association for Computing Machinery.

Hutto, C. and Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1):216–225.

Kansaon, D., Melo, P. d. F., Zannettou, S., Feldmann, A., and Benevenuto, F. (2024). Strategies and attacks of digital militias in whatsapp political groups. Proceedings of the International AAAI Conference on Web and Social Media, 18(1):813–825.

Lees, A., Tran, V. Q., Tay, Y., Sorensen, J., Gupta, J., Metzler, D., and Vasserman, L. (2022). A new generation of perspective api: Efficient multilingual character-level transformers.

Loshchilov, I. and Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.

Melo, P., Messias, J., Resende, G., Garimella, K., Almeida, J., and Benevenuto, F. (2019). Whatsapp monitor: A fact-checking system for whatsapp. Proceedings of the International AAAI Conference on Web and Social Media, 13(01):676–677.

Melo, P. d. F., Hoseini, M., Zannettou, S., and Benevenuto, F. (2024). Don’t break the chain: Measuring message forwarding on whatsapp. Proceedings of the International AAAI Conference on Web and Social Media, 18(1):1054–1067.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.

Pérez, J. M., Rajngewerc, M., Giudici, J. C., Furman, D. A., Luque, F., Alemany, L. A., and Martínez, M. V. (2024). pysentimiento: A python toolkit for opinion mining and social nlp tasks.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2023). Attention is all you need.

Wich, M., Gorniak, A., Eder, T., Bartmann, D., Çakici, B. E., and Groh, G. (2022). Introducing an abusive language classification framework for telegram to investigate the german hater community. Proceedings of the International AAAI Conference on Web and Social Media, 16(1):1133–1144.
Publicado
17/11/2024
BUZELIN, Arthur et al. Contextual BERT Model for Toxicity Detection in Messaging Platforms. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 846-857. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2024.245275.

Artigos mais lidos do(s) mesmo(s) autor(es)

<< < 1 2 3 4 5