Investigating Contextual Word Embeddings in Semi-Supervised Learning for Toxic Comment Detection
Abstract
The proliferation of toxic messages on the Web has led many social networks to limit or shut down user comments, as these messages are harmful to people and keep them away from online platforms. Most approaches deal with this unacceptable language form, focusing only on identifying if a comment is toxic or not, using a supervised learning strategy, leaving aside the various types of harmful messages, such as LGBT+phobia, racism, xenophobia, and others. In this paper, we investigate three contextual word embedding models: BERTimbau, BERTweet.BR, and LLaMA 3.1 within a semi-supervised approach to distinguish whether a comment is toxic. Also, we explore this strategy to detect six types of toxicity: LGBT+phobia, insult, racism, obscenity, xenophobia, and misogyny. This task is defined as a multi-label classification problem, as a comment may contain several types of abusive language. We evaluated our approach using the ToLD-BR corpus and achieved competitive results for the binary toxicity classification task. In the context of multi-label toxicity detection, our best result outperformed approaches based on supervised learning, using significantly fewer labeled data, and emphasized their efficiency and practicality.
References
Gabriel Assis, Annie Amorim, Jonnathan Carvalho, Daniel de Oliveira, Daniela Vianna, and Aline Paes. 2024. Exploring Portuguese Hate Speech Detection in Low-Resource Settings: Lightly Tuning Encoder Models or In-Context Learning of Large Models?. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, Pablo Gamallo, Daniela Claro, António Teixeira, Livy Real, Marcos Garcia, Hugo Gonçalo Oliveira, and Raquel Amaro (Eds.). Association for Computational Lingustics, Santiago de Compostela, Galicia/ Spain, 301–311.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and et al. Askell. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Online, 1877–1901.
Fernando Carneiro, Daniela Vianna, Jonnathan Carvalho, Alexandre Plastino, and Aline Paes. 2024. BERTweet.BR: a pre-trained language model for tweets in Portuguese. Neural Computing and Applications 37 (12 2024), 4363–4385.
Thales Felipe Costa Bertaglia and Maria das Graças Volpe Nunes. 2016. Exploring Word Embeddings for Unsupervised Textual User-Generated Content Normalization. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). The COLING 2016 Organizing Committee, Osaka, Japan, 112–120.
Rogers Prates de Pelle and Viviane P Moreira. 2017. Offensive Comments in the Brazilian Web: a dataset and baseline results. In Anais do VI Brazilian Workshop on Social Network Analysis and Mining. Sociedade Brasileira de Computação, São Paulo, Brazil, 510–519.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Proceedings of the Twenty-ninth Conference on Neural Information Processing Systems. Curran Associates, Inc., Montreal, Canada.
Paula Fortuna, João Rocha da Silva, Juan Soler-Company, LeoWanner, and Sérgio Nunes. 2019. A Hierarchically-Labeled Portuguese Hate Speech Dataset. In Proceedings of the Third Workshop on Abusive Language Online. Association for Computational Linguistics, Florence, Italy, 94–104.
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, and et al. 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI]
Nathan Hartmann, Erick Fonseca, Christopher Shulby, Marcos Treviso, Jéssica Silva, and Sandra Aluísio. 2017. Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. In Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology. Sociedade Brasileira de Computação, Uberlândia, Brazil, 122–131.
Aiqi Jiang and Arkaitz Zubiaga. 2023. SexWEs: Domain-AwareWord Embeddings via Cross-Lingual Semantic Specialisation for Chinese Sexism Detection in Social Media. Proceedings of the International AAAI Conference on Web and Social Media 17, 1 (Jun. 2023), 447–458. DOI: 10.1609/icwsm.v17i1.22159
Ben King, Rahul Jha, and Dragomir R. Radev. 2014. Heterogeneous Networks and Their Applications: Scientometrics, Name Disambiguation, and Topic Modeling. Transactions of the Association for Computational Linguistics 2 (2014), 1–14.
Jordan K. Kobellarz and Thiago H. Silva. 2022. Should We Translate? Evaluating Toxicity in Online Comments when Translating from Portuguese to English. In Proceedings of the Brazilian Symposium on Multimedia and the Web (Curitiba, Brazil) (WebMedia ’22). Association for Computing Machinery, New York, NY, USA, 89–98. DOI: 10.1145/3539637.3556892
Alyssa Lees, Vinh Q. Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, Donald Metzler, and Lucy Vasserman. 2022. A New Generation of Perspective API: Efficient Multilingual Character-level Transformers. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD ’22). Association for Computing Machinery, New York, NY, USA, 3197–3207. DOI: 10.1145/3534678.3539147
João Augusto Leite, Diego Silva, Kalina Bontcheva, and Carolina Scarton. 2020. Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 914–924.
Andrzej Maćkiewicz and Waldemar Ratajczak. 1993. Principal components analysis (PCA). Computers & Geosciences 19, 3 (1993), 303–342. DOI: 10.1016/0098-3004(93)90090-R
Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). Scottsdale, Arizona, USA.
Francimaria R.S. Nascimento, George D.C. Cavalcanti, and Márjory Da Costa-Abreu. 2022. Unintended bias evaluation: An analysis of hate speech detection and gender bias mitigation on social media using ensemble learning. Expert Systems with Applications 201 (2022), 117032. DOI: 10.1016/j.eswa.2022.117032
Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A pretrained language model for English Tweets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Qun Liu and David Schlangen (Eds.). Association for Computational Linguistics, Online, 9–14.
Amanda Oliveira, Thiago Cecote, Pedro Silva, Jadson Gertrudes, Vander Freitas, and Eduardo Luz. 2023. How Good Is ChatGPT For Detecting Hate Speech In Portuguese?. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (Belo Horizonte/MG). SBC, Porto Alegre, RS, Brasil, 94–103.
Amanda Oliveira, Pedro H. Silva, Valéria Santos, Gladston Moreira, Vander L. Freitas, and Eduardo J. Luz. 2024. Toxic Text Classification in Portuguese: Is LLaMA 3.1 8B All You Need?. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (Belém/PA). SBC, Porto Alegre, RS, Brasil, 57–66.
OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, and et al. Janko Altenschmidt. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
Paulo Roberto Pasqualotti and Renata Vieira. 2008. WordnetAffectBR: uma base lexical de palavras de emoções para a língua portuguesa. RENOTE 6, 1 (2008).
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
Rogers Pelle, Cleber Alcântara, and Viviane P. Moreira. 2018. A Classifier Ensemble for Offensive Text Detection. In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web. Association for Computing Machinery, Salvador, BA, Brazil, 237–243.
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532–1543.
Fabio Poletto, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, and Viviana Patti. 2020. Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation 55 (2020), 477–523.
Rafael Geraldeli Rossi. 2015. Classificação automática de textos por meio de aprendizado de máquina baseado em redes. Ph.D. Dissertation. Instituto de Ciências Matemáticas e de Computação. [link]
Parisa Safikhani and David Broneske. 2025. AutoML Meets Hugging Face: Domain-Aware Pretrained Model Selection for Text Classification. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, and Shira Wein (Eds.). Association for Computational Linguistics, Albuquerque, USA, 466–473. DOI: 10.18653/v1/2025.naaclsrw.45
Ghivvago D Saraiva, Rafael T Anchiêta, Franciso A R Neto, and Raimundo S Moura. 2021. A Semi-Supervised Approach to Detect Toxic Comments. In Proceedings of the International Conference on Recents Advances in Natural Language Processing. INCOMA Ltd., Online, 1265–1271.
Chuan. Shi and Philip S. Yu. 2017. Heterogeneous Information Network Analysis and Applications (1st ed. ed.). Springer International Publishing, Cham.
Mário J Silva, Paula Carvalho, and Luís Sarmento. 2012. Building a sentiment lexicon for social judgement mining. In Proceedings of the 10th International Conference on Computational Processing of the Portuguese Language. Springer, Coimbra, Portugal, 218–228.
Noah A. Smith. 2020. Contextual word representations: putting words into computers. Commun. ACM 63, 6 (May 2020), 66–74.
Claver Soto, Gustavo Nunes, and José Gomes. 2019. Avaliação de técnicas de word embedding na tarefa de detecção de discurso de ódio. In Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional (Salvador). SBC, Porto Alegre, RS, Brasil, 1020–1031.
Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. 2020. BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In Intelligent Systems, Ricardo Cerri and Ronaldo C. Prati (Eds.). Springer International Publishing, Cham, 403–417.
Francielle Vargas, Isabelle Carvalho, Thiago A. S. Pardo, and Fabrício Benevenuto. 2024. Context-aware and expert data resources for Brazilian Portuguese hate speech detection. Natural Language Processing (2024), 1–22. DOI: 10.1017/nlp.2024.18
Francielle Vargas, Isabelle Carvalho, Fabiana Rodrigues de Góes, Thiago Pardo, and Fabrício Benevenuto. 2022. HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 7174–7183.
Francielle Vargas, Fabiana Goés, Isabelle Carvalho, Frabrício Benevenuto, and Thiago A S Pardo. 2021. Contextual-Lexicon Approach for Abusive Language Detection. In Proceedings of the International Conference on Recents Advances in Natural Language Processing. INCOMA Ltd., Online, 1438–1447.
Jorge A. Wagner Filho, Rodrigo Wilkens, Marco Idiart, and Aline Villavicencio. 2018. The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association, Miyazaki, Japan, 4339–4344.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 75–86.
Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous Graph Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Anchorage, AK, USA). Association for Computing Machinery, New York, NY, USA, 793–803.
Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Advances in neural information processing systems. MIT Press, MA, USA, 321–328.
Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. 2003. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning. AAAI Press, Washington, DC, USA, 912–919.
