How Good Is ChatGPT For Detecting Hate Speech In Portuguese?

Amanda S. Oliveira; Thiago C. Cecote; Pedro H. L. Silva; Jadson C. Gertrudes; Vander L. S. Freitas; Eduardo J. S. Luz

doi:10.5753/stil.2023.233943

Amanda S. Oliveira UFOP http://orcid.org/0009-0006-8000-7297
Thiago C. Cecote UFOP http://orcid.org/0009-0008-7847-7315
Pedro H. L. Silva UFOP https://orcid.org/0000-0002-5525-6121
Jadson C. Gertrudes UFOP https://orcid.org/0000-0002-0861-6681
Vander L. S. Freitas UFOP https://orcid.org/0000-0001-7989-0816
Eduardo J. S. Luz UFOP https://orcid.org/0000-0001-5249-1559

DOI: https://doi.org/10.5753/stil.2023.233943

Abstract

This study evaluates OpenAI’s ChatGPT, a large language model, for its efficacy in detecting hate speech in Portuguese tweets, comparing it with purpose-trained models. Despite incurring considerable computational costs, ChatGPT as a zero-shot classifier demonstrated commendable performance, even superior to or on par with state-of-the-art methods, with an F1-score of 73.0% on the ToLD-BR. In a cross-dataset evaluation on the HLPHSP dataset, it secured a superior F1-score of 73%. The choice of prompt significantly impacts the outcome, with a wider scope prompt balancing precision and recall metrics. ChatGPT, due to its interpretability and resilience against data distribution shifts, could be a preferred choice for tasks prioritizing these factors.

Keywords: Hate Speech, NLP, LLM, ChatGPT, BERTimbau

References

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.

Davidson, T., Warmsley, D., Macy, M., and Weber, I. (2017). Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 512–515. https://doi.org/10.1609/icwsm.v11i1.14955

de Pelle, R. P. and Moreira, V. P. (2017). Offensive comments in the brazilian web: a dataset and baseline results. In Anais do VI Brazilian Workshop on Social Network Analysis and Mining. SBC. https://doi.org/10.5753/brasnam.2017.3260

Fortuna, P., da Silva, J. R., Wanner, L., Nunes, S., et al. (2019). A hierarchically-labeled portuguese hate speech dataset. In Proceedings of the third workshop on abusive language online, pages 94–104. http://dx.doi.org/10.18653/v1/W19-3510

Gamer, M., Lemon, J., Gamer, M. M., Robinson, A., and Kendall’s, W. (2012). Package ‘irr’. Various coefficients of interrater reliability and agreement, 22:1–32.

Guillou, P. (2021). Portuguese bert large cased qa (question answering), finetuned on squad v1.1.

Leite, J. A., Silva, D., Bontcheva, K., and Scarton, C. (2020). Toxic language detection in social media for brazilian portuguese: New dataset and multilingual analysis. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 914–924.

Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., and Patel, A. (2019). Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. In Proceedings of the 11th forum for information retrieval evaluation, pages 14–17. https://doi.org/10.1145/3368567.3368584

Mubarak, H., Darwish, K., and Magdy, W. (2017). Abusive language detection on arabic social media. In Proceedings of the first workshop on abusive language online, pages 52–56. http://dx.doi.org/10.18653/v1/W17-3008

Penedo, G., Malartic, Q., Hesslow, D., Cojocaru, R., Cappelli, A., Alobeidli, H., Pannier, B., Almazrouei, E., and Launay, J. (2023). The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116. https://doi.org/10.48550/arXiv.2306.01116

Radfar, B., Shivaram, K., and Culotta, A. (2020). Characterizing variation in toxic language by social context. In Proceedings of the international AAAI conference on web and social media, volume 14, pages 959–963 https://doi.org/10.1609/icwsm.v14i1.7366

Salehabadi, N., Groggel, A., Singhal, M., Roy, S. S., and Nilizadeh, S. (2022). User engagement and the toxicity of tweets. arXiv preprint arXiv:2211.03856. https://doi.org/10.48550/arXiv.2211.03856 https://arxiv.org/abs/2211.03856

Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108 https://arxiv.org/abs/1910.01108

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Intelligent Systems: 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, October 20–23, 2020, Proceedings, Part I 9, pages 403–417. Springer. https://doi.org/10.1007/978-3-030-61377-8_28

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 https://doi.org/10.48550/arXiv.2302.13971 https://arxiv.org/abs/2302.13971

Vargas, F., Carvalho, I., de Góes, F. R., Pardo, T., and Benevenuto, F. (2022). Hatebr: A large expert annotated corpus of brazilian instagram comments for offensive language and hate speech detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7174–7183.

Walther, J. B. (2022). Social media and online hate. Current Opinion in Psychology, 45:101298. https://doi.org/10.1016/j.copsyc.2021.12.010

Wiegand, M., Siegel, M., and Ruppenhofer, J. (2018). Overview of the germeval 2018 shared task on the identification of offensive language. In 14th Conference on Natural Language Processing - KONVENS 2018. Verlag der Osterreichischen Akademie der Wissenschaften https://doi.org/10.1553/0x003a105d

Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., and Kumar, R. (2019). Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). arXiv preprint arXiv:1903.08983. https://doi.org/10.48550/arXiv.1903.08983 https://arxiv.org/abs/1903.08983