Detecting Hate, Offensive, and Regular Speech in Short Comments

  • Thais G. Almeida UFAM
  • Bruno A. Souza UFAM
  • Fabíola Guerra Nakamura UFAM
  • Eduardo Freire Nakamura UFAM

Resumo


The freedom of expression provided by the Internet also favors malicious groups that propagate contents of hate, recruit new members, and threaten users. In this context, we propose a new approach for hate speech identification based on Information Theory quantifiers (entropy and divergence) to represent documents. As a differential of our approach, we capture weighted information of words, rather than just their frequency in documents. The results show that our approach overperforms techniques that use data representation, such as TF-IDF and unigrams combined to text classifiers, achieving an F1-score of 86%, 84% e 96% for classifying hate, offensive, and regular speech classes, respectively. Compared to the baselines, our proposal is a win-win solution that improves efficacy (F1-score) and efficiency (by reducing the dimension of the feature vector). The proposed solution is up to 2.27 times faster than the baseline.
Publicado
17/10/2017
ALMEIDA, Thais G.; SOUZA, Bruno A.; NAKAMURA, Fabíola Guerra; NAKAMURA, Eduardo Freire. Detecting Hate, Offensive, and Regular Speech in Short Comments. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 23. , 2017, Gramado. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 225-228.

Artigos mais lidos do(s) mesmo(s) autor(es)