Contextual BERT Model for Toxicity Detection in Messaging Platforms

Arthur Buzelin; Yan Aquino; Pedro Bento; Lucas Dayrell; Victoria Estanislau; Samira Malaquias; Pedro Dutenhefner; Luisa G. Porfírio; Pedro B. Rigueira; Caio Souza Grossi; Guilherme H. G. Evangelista; Gisele L. Pappa; Wagner Meira Jr.

doi:10.5753/eniac.2024.245275

Arthur Buzelin UFMG
Yan Aquino UFMG
Pedro Bento UFMG
Lucas Dayrell UFMG
Victoria Estanislau UFMG
Samira Malaquias UFMG
Pedro Dutenhefner UFMG
Luisa G. Porfírio UFMG
Pedro B. Rigueira UFMG
Caio Souza Grossi UFMG
Guilherme H. G. Evangelista UFMG
Gisele L. Pappa UFMG
Wagner Meira Jr. UFMG

DOI: https://doi.org/10.5753/eniac.2024.245275

Abstract

The increasing prevalence of messaging platforms has created new challenges in hate speech detection. Traditional classification models designed for social media posts often fall short in these environments due to the lack of contextual information. This paper presents a novel approach to message classification by integrating contextual data from preceding messages, utilizing a fine-tuned BERT model based on PySentimiento. Our results demonstrate that incorporating preceding messages substantially improves the classification task. The average AUC-ROC increased from 0.691 with the PySentimiento base model to 0.784 with standard fine-tuning, and further to an impressive 0.926 with our context-based model.

Keywords: BERT, Transformers, Toxicity detection, A.I, Fine-Tuning

References

Aliaksandr Herasimenka, Jonathan Bright, A. K. and Howard, P. N. (2023). Misinformation and professional news on largely unmoderated platforms: the case of telegram. Journal of Information Technology & Politics, 20(2):198–212.

Balayn, A., Yang, J., Szlavik, Z., and Bozzon, A. (2021). Automatic identification of harmful, aggressive, abusive, and offensive language on the web: A survey of technical biases informed by psychology literature. ACM Transactions on Social Computing (TSC), 4(3):1–56.

Bento, P., Buzelin, A., Aquino, Y., Carvalho, I., Dutenhefner, P., Dayrell, L., Santana, C., Estanislau, V., Pappa, G., Miranda, D., Almeida, V., and Jr, W. M. (2024). Impacto da pandemia na discussão sobre saúde mental: O caso do discord no brasil. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web, pages 179–187, Porto Alegre, RS, Brasil. SBC.

Caselli, T., Basile, V., Mitrović, J., and Granitzer, M. (2020). Hatebert: Retraining bert for abusive language detection in english. arXiv preprint arXiv:2010.12472.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.

da Costa, P., Pavan, M., dos Santos, W., da Silva, S., and Paraboni, I. (2023). Bertabaporu: Assessing a genre-specific language model for portuguese nlp. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, page 217–223, Varna, Bulgaria. INCOMA Ltd.

Dahiya, S., Mohta, A., and Jain, A. (2020). Text classification based behavioural analysis of whatsapp chats. In 2020 5th International Conference on Communication and Electronics Systems (ICCES), pages 717–724.

Hoseini, M., Melo, P., Júnior, M., Benevenuto, F., Chandrasekaran, B., Feldmann, A., and Zannettou, S. (2020). Demystifying the messaging platforms’ ecosystem through the lens of twitter. In Proceedings of the ACM Internet Measurement Conference, IMC ’20, page 345–359, New York, NY, USA. Association for Computing Machinery.

Hutto, C. and Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1):216–225.

Kansaon, D., Melo, P. d. F., Zannettou, S., Feldmann, A., and Benevenuto, F. (2024). Strategies and attacks of digital militias in whatsapp political groups. Proceedings of the International AAAI Conference on Web and Social Media, 18(1):813–825.

Lees, A., Tran, V. Q., Tay, Y., Sorensen, J., Gupta, J., Metzler, D., and Vasserman, L. (2022). A new generation of perspective api: Efficient multilingual character-level transformers.

Loshchilov, I. and Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.

Melo, P., Messias, J., Resende, G., Garimella, K., Almeida, J., and Benevenuto, F. (2019). Whatsapp monitor: A fact-checking system for whatsapp. Proceedings of the International AAAI Conference on Web and Social Media, 13(01):676–677.

Melo, P. d. F., Hoseini, M., Zannettou, S., and Benevenuto, F. (2024). Don’t break the chain: Measuring message forwarding on whatsapp. Proceedings of the International AAAI Conference on Web and Social Media, 18(1):1054–1067.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.

Pérez, J. M., Rajngewerc, M., Giudici, J. C., Furman, D. A., Luque, F., Alemany, L. A., and Martínez, M. V. (2024). pysentimiento: A python toolkit for opinion mining and social nlp tasks.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2023). Attention is all you need.

Wich, M., Gorniak, A., Eder, T., Bartmann, D., Çakici, B. E., and Groh, G. (2022). Introducing an abusive language classification framework for telegram to investigate the german hater community. Proceedings of the International AAAI Conference on Web and Social Media, 16(1):1133–1144.

Contextual BERT Model for Toxicity Detection in Messaging Platforms

Abstract

References

Most read articles by the same author(s)