Evaluating Hate Speech Detection to Unseen Target Groups

  • Alexandre Negretti UNICAMP
  • Marcos M. Raimundo UNICAMP


LLMs trained to detect hate speech have a significant challenge on identifying hate speech directed toward new or less common target groups. This happens because the models are primarily trained on data focused on more prevalent forms of hate, targeting groups that have historically been subjected to hate speech. Not only the way of defamation evolves through time, but new targets may emerge, presenting forms of hate that were previously non-existent in datasets. This work presents analyses of the influence of targeted groups on model prediction. We evaluate training strategies that address target group bias in hate speech detectors. Lastly, we present a novel dataset composed of text posts from Twitter regarding the 2022 Russia-Ukraine war.
Palavras-chave: Large Language Models, Hate Speech Detection, Slavic Hate Dataset


