A Change in Perspective: The Trade-Off Between Perspective API and Custom Models in Classifying Hate Speech in Portuguese
Resumo
This paper examines the performance of the Perspective API, developed by Jigsaw, in detecting hate speech in Portuguese. Although the Perspective API supports multiple languages, its performance metrics are often aggregated, obscuring specific details. Our study reveals that the API’s AUC-ROC score for Portuguese is significantly lower than for English (0.744 vs. 0.942). To address this, we developed a BERT classifier model trained on a Portuguese Twitter hate speech dataset. Our model, with just 100 messages in it’s training set, outperformed the Perspective API. These findings highlight the need for more granular performance metrics and suggest that custom models may offer better solutions for specific languages.
Referências
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Fortuna, P., Nunes, S., Soler-Company, J., and Wanner, L. (2019). A hierarchically-labeled portuguese hate speech dataset. In Proceedings of the Third Workshop on Abusive Language Online, pages 94–104. Association for Computational Linguistics.
Kennedy, C., Bacon, G., Sahn, A., and Vacano, C. (2020). Constructing interval variables via faceted rasch measurement and multitask deep learning: a hate speech application.
Kobellarz, J. K. and Silva, T. H. (2022). Should we translate? evaluating toxicity in online comments when translating from portuguese to english. In Anais do Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia), pages 95–104, Porto Alegre, Brazil. Sociedade Brasileira de Computação. In: 28th Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia), 2022, Curitiba.
Lees, A., Tran, V. Q., Tay, Y., Sorensen, J., Gupta, J., Metzler, D., and Vasserman, L. (2022). A new generation of perspective api: Efficient multilingual character-level transformers.
Lima, Q. L. H., Pagano, S. A., and da Silva, A. (2024). Toxic content detection in online social networks: A new dataset from brazilian reddit communities. In 16th International Conference on Computational Processing of Portuguese (PROPOR 2024).
Loshchilov, I. and Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
Nogara, G., Pierri, F., Cresci, S., Luceri, L., Törnberg, P., and Giordano, S. (2024). Toxic bias: Perspective api misreads german as more toxic.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, highperformance deep learning library. Advances in neural information processing systems, 32.
Roy, S. G., Narayan, U., Raha, T., Abid, Z., and Varma, V. (2021). Leveraging multilingual transformers for hate speech detection. ArXiv, abs/2101.03207.
Silva, M., de Oliveira, V., and Pardo, T. (2023). A sentiment analysis benchmark for automated machine learning applications and a proof of concept in hate speech detection. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 199–206, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/stil.2023.234176
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Proceedings of the 9th Brazilian Conference on Intelligent Systems (BRACIS), pages 403–417. IEEE.