Multiple Voices, Greater Power: A Strategy for Combining Language Models to Combat Hate Speech
Resumo
Social media platforms face significant issues in avoiding a harmful environment with offensive comments and hate speech. Some of these challenges are inherently linked to the diversity of user perspectives, complicating the classification and detection of hate speech, particularly in culturally rich and diverse countries like Brazil. To address these complexities in identifying hate speech in Brazilian Portuguese, our work proposes the implementation of ensemble methods based on stacking and soft-voting, incorporating four distinct language models with varied architectures and pre-trainings: BERTimbau-base, BERTimbau-large, BERTweet.BR, and Bernice. The findings reveal the superiority of the proposed approach over individual prediction models, suggesting that the combination of multiple models may effectively integrate different perspectives, resulting in an accuracy improvement of up to 6% compared to the isolated classifications of the models.
Referências
Akhtar, S., Basile, V., and Patti, V. Whose opinions matter? perspective-aware models to identify opinions of hate speech victims in abusive language detection, 2021.
Aluru, S. S., Mathew, B., Saha, P., and Mukherjee, A. Deep learning models for multilingual hate speech detection, 2020.
Assis, G., Amorim, A., Carvalho, J., de Oliveira, D., Vianna, D., and Paes, A. Exploring Portuguese hate speech detection in low-resource settings: Lightly tuning encoder models or in-context learning of large models? In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, P. Gamallo, D. Claro, A. Teixeira, L. Real, M. Garcia, H. G. Oliveira, and R. Amaro (Eds.). Association for Computational Lingustics, Santiago de Compostela, Galicia/Spain, pp. 301–311, 2024.
Breiman, L. Bagging predictors. Machine Learning 24 (2): 123–140, Aug, 1996.
Caneiro, F., Viana, D., Carvalho, J., Plastino, A., and Paes, A. BERTweet.BR: A Pre-Trained Language Model for Tweets in Portuguese. Neural Computing and Applications, 2024. Accepted, to appear.
Chu, T. M., Weitzel, L., and Quaresma, P. Comparative analysis of hate speech detection models on brazilian portuguese data: Modified bert vs. bert vs. standard machine learning algorithms. In Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA. INSTICC, SciTePress, Dijon, France, pp. 392–400, 2024.
da Silva, R. C. C. and Rosa, T. C. Combining data transformation and classification approaches for hate speech detection: A comparative study. Available at SSRN, 2023.
DeLucia, A., Wu, S., Mueller, A., Aguirre, C., Resnik, P., and Dredze, M. Bernice: A Multilingual Pre-trained Encoder for Twitter. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y. Goldberg, Z. Kozareva, and Y. Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp. 6191–6205, 2022.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, J. Burstein, C. Doran, and T. Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, pp. 4171–4186, 2019.
Kenyon-Dean, K., Ahmed, E., Fujimoto, S., Georges-Filteau, J., Glasz, C., Kaur, B., Lalande, A., Bhanderi, S., Belfer, R., Kanagasabai, N., Sarrazingendron, R., Verma, R., and Ruths, D. Sentiment analysis: It’s complicated! In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), M. Walker, H. Ji, and A. Stent (Eds.). Association for Computational Linguistics, New Orleans, Louisiana, pp. 1886–1895, 2018.
Leite, J. A., Silva, D., Bontcheva, K., and Scarton, C. Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, K.-F. Wong, K. Knight, and H. Wu (Eds.). Association for Computational Linguistics, Suzhou, China, pp. 914–924, 2020.
Leonardelli, E., Abercrombie, G., Almanea, D., Basile, V., Fornaciari, T., Plank, B., Rieser, V., Uma, A., and Poesio, M. SemEval-2023 task 11: Learning with disagreements (LeWiDi). In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), A. K. Ojha, A. S. Doğruöz, G. Da San Martino, H. Tayyar Madabushi, R. Kumar, and E. Sartori (Eds.). Association for Computational Linguistics, Toronto, Canada, pp. 2304–2318, 2023.
Mnassri, K., Rajapaksha, P., Farahbakhsh, R., and Crespi, N. BERT-based ensemble approaches for hate speech detection. In GLOBECOM 2022 - 2022 IEEE Global Communications Conference. Institute of Electrical and Electronics Engineers, Rio de Janeiro, Brazil, pp. 4649–4654, 2022.
Oliveira, A., de Carvalho Cecote, T., Alvarenga, J. P. R., de Souza Freitas, V. L., and da Silva Luz, E. J. Toxic Speech Detection in Portuguese: A Comparative Study of Large Language Models. In Proceedings of the 16th International Conference on Computational Processing of Portuguese, P. Gamallo, D. Claro, A. Teixeira, L. Real, M. Garcia, H. G. Oliveira, and R. Amaro (Eds.). Association for Computational Lingustics, Santiago de Compostela, Galicia/Spain, pp. 108–116, 2024.
Pelle, R., Alcântara, C., and Moreira, V. P. A classifier ensemble for offensive text detection. In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web. WebMedia ’18. Association for Computing Machinery, New York, NY, USA, pp. 237–243, 2018.
Risch, J. and Krestel, R. Bagging BERT models for robust aggression identification. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, R. Kumar, A. K. Ojha, B. Lahiri, M. Zampieri, S. Malmasi, V. Murdock, and D. Kadar (Eds.). European Language Resources Association (ELRA), Marseille, France, pp. 55–61, 2020.
Saraiva, G. D., Anchiêta, R., Neto, F. A. R., and Moura, R. A semi-supervised approach to detect toxic comments. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). INCOMA Ltd., Held Online, pp. 1261–1267, 2021.
Shahriar, S. and Solorio, T. SafeWebUH at SemEval-2023 task 11: Learning annotator disagreement in derogatory text: Comparison of direct training vs aggregation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), A. K. Ojha, A. S. Doğruöz, G. Da San Martino, H. Tayyar Madabushi, R. Kumar, and E. Sartori (Eds.). Association for Computational Linguistics, Toronto, Canada, pp. 94–100, 2023.
Souza, F., Nogueira, R., and Lotufo, R. BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In Intelligent Systems, R. Cerri and R. C. Prati (Eds.). Springer International Publishing, Cham, pp. 403–417, 2020.
Sullivan, M., Yasin, M., and Jacobs, C. L. University at buffalo at SemEval-2023 task 11: MASDA–modelling annotator sensibilities through DisAggregation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), A. K. Ojha, A. S. Doğruöz, G. Da San Martino, H. Tayyar Madabushi, R. Kumar, and E. Sartori (Eds.). Association for Computational Linguistics, Toronto, Canada, pp. 978–985, 2023.
Uma, A., Fornaciari, T., Dumitrache, A., Miller, T., Chamberlain, J., Plank, B., Simpson, E., and Poesio, M. SemEval-2021 task 12: Learning with disagreements. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), A. Palmer, N. Schneider, N. Schluter, G. Emerson, A. Herbelot, and X. Zhu (Eds.). Association for Computational Linguistics, Online, pp. 338–347, 2021.
Vargas, F., Carvalho, I., Rodrigues de Góes, F., Pardo, T., and Benevenuto, F. HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, and S. Piperidis (Eds.). European Language Resources Association, Marseille, France, pp. 7174–7183, 2022.
Vargas, F., Rodrigues de Góes, F., Carvalho, I., Benevenuto, F., and Pardo, T. Contextual-lexicon approach for abusive language detection. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). INCOMA Ltd., Held Online, pp. 1438–1447, 2021.
Wolpert, D. H. Stacked generalization. Neural Networks 5 (2): 241–259, 1992.
Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC, 2012.