How Good Is ChatGPT For Detecting Hate Speech In Portuguese?
Resumo
This study evaluates OpenAI’s ChatGPT, a large language model, for its efficacy in detecting hate speech in Portuguese tweets, comparing it with purpose-trained models. Despite incurring considerable computational costs, ChatGPT as a zero-shot classifier demonstrated commendable performance, even superior to or on par with state-of-the-art methods, with an F1-score of 73.0% on the ToLD-BR. In a cross-dataset evaluation on the HLPHSP dataset, it secured a superior F1-score of 73%. The choice of prompt significantly impacts the outcome, with a wider scope prompt balancing precision and recall metrics. ChatGPT, due to its interpretability and resilience against data distribution shifts, could be a preferred choice for tasks prioritizing these factors.
Referências
Davidson, T., Warmsley, D., Macy, M., and Weber, I. (2017). Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 512–515. https://doi.org/10.1609/icwsm.v11i1.14955
de Pelle, R. P. and Moreira, V. P. (2017). Offensive comments in the brazilian web: a dataset and baseline results. In Anais do VI Brazilian Workshop on Social Network Analysis and Mining. SBC. https://doi.org/10.5753/brasnam.2017.3260
Fortuna, P., da Silva, J. R., Wanner, L., Nunes, S., et al. (2019). A hierarchically-labeled portuguese hate speech dataset. In Proceedings of the third workshop on abusive language online, pages 94–104. http://dx.doi.org/10.18653/v1/W19-3510
Gamer, M., Lemon, J., Gamer, M. M., Robinson, A., and Kendall’s, W. (2012). Package ‘irr’. Various coefficients of interrater reliability and agreement, 22:1–32.
Guillou, P. (2021). Portuguese bert large cased qa (question answering), finetuned on squad v1.1.
Leite, J. A., Silva, D., Bontcheva, K., and Scarton, C. (2020). Toxic language detection in social media for brazilian portuguese: New dataset and multilingual analysis. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 914–924.
Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., and Patel, A. (2019). Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. In Proceedings of the 11th forum for information retrieval evaluation, pages 14–17. https://doi.org/10.1145/3368567.3368584
Mubarak, H., Darwish, K., and Magdy, W. (2017). Abusive language detection on arabic social media. In Proceedings of the first workshop on abusive language online, pages 52–56. http://dx.doi.org/10.18653/v1/W17-3008
Penedo, G., Malartic, Q., Hesslow, D., Cojocaru, R., Cappelli, A., Alobeidli, H., Pannier, B., Almazrouei, E., and Launay, J. (2023). The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116. https://doi.org/10.48550/arXiv.2306.01116
Radfar, B., Shivaram, K., and Culotta, A. (2020). Characterizing variation in toxic language by social context. In Proceedings of the international AAAI conference on web and social media, volume 14, pages 959–963 https://doi.org/10.1609/icwsm.v14i1.7366
Salehabadi, N., Groggel, A., Singhal, M., Roy, S. S., and Nilizadeh, S. (2022). User engagement and the toxicity of tweets. arXiv preprint arXiv:2211.03856. https://doi.org/10.48550/arXiv.2211.03856 https://arxiv.org/abs/2211.03856
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108 https://arxiv.org/abs/1910.01108
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Intelligent Systems: 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, October 20–23, 2020, Proceedings, Part I 9, pages 403–417. Springer. https://doi.org/10.1007/978-3-030-61377-8_28
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 https://doi.org/10.48550/arXiv.2302.13971 https://arxiv.org/abs/2302.13971
Vargas, F., Carvalho, I., de Góes, F. R., Pardo, T., and Benevenuto, F. (2022). Hatebr: A large expert annotated corpus of brazilian instagram comments for offensive language and hate speech detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7174–7183.
Walther, J. B. (2022). Social media and online hate. Current Opinion in Psychology, 45:101298. https://doi.org/10.1016/j.copsyc.2021.12.010
Wiegand, M., Siegel, M., and Ruppenhofer, J. (2018). Overview of the germeval 2018 shared task on the identification of offensive language. In 14th Conference on Natural Language Processing - KONVENS 2018. Verlag der Osterreichischen Akademie der Wissenschaften https://doi.org/10.1553/0x003a105d
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., and Kumar, R. (2019). Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). arXiv preprint arXiv:1903.08983. https://doi.org/10.48550/arXiv.1903.08983 https://arxiv.org/abs/1903.08983