Detecting Hate, Offensive, and Regular Speech in Short Comments

Thais G. Almeida; Bruno A. Souza; Fabíola Guerra Nakamura; Eduardo Freire Nakamura

Thais G. Almeida UFAM
Bruno A. Souza UFAM
Fabíola Guerra Nakamura UFAM
Eduardo Freire Nakamura UFAM

Resumo

The freedom of expression provided by the Internet also favors malicious groups that propagate contents of hate, recruit new members, and threaten users. In this context, we propose a new approach for hate speech identification based on Information Theory quantifiers (entropy and divergence) to represent documents. As a differential of our approach, we capture weighted information of words, rather than just their frequency in documents. The results show that our approach overperforms techniques that use data representation, such as TF-IDF and unigrams combined to text classifiers, achieving an F1-score of 86%, 84% e 96% for classifying hate, offensive, and regular speech classes, respectively. Compared to the baselines, our proposal is a win-win solution that improves efficacy (F1-score) and efficiency (by reducing the dimension of the feature vector). The proposed solution is up to 2.27 times faster than the baseline.

Detecting Hate, Offensive, and Regular Speech in Short Comments

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)