Feature extraction for hate speech identification in documents

  • Cleiton Lima UFFS
  • Guilherme Dal Bianco UFFS

Abstract


Social media is increasingly present in people’s lives, including tools that allow users to collaborate with the creation of the content. Many users utilize these functions to post texts spreading illicit or criminal content. Most works on abusive identification use supervising learning, which demands the feature extraction to achieve good quality. The meta-feature represents a state-of-the-art feature extraction on text classification. In this work, we propose a combination of feature extraction to improve the detecting of offensive speech using meta-features. Our results, on real datasets, show that our proposed combination of features outperforms in around 3.5% the effectiveness of state-of-the-art approaches.

Keywords: Social Medias, Feature Extraction, Speech, Hate

References

Batista, G. E. d. A. P. et al. (2003). Pré-processamento de dados em aprendizado de máquina supervisionado. PhD thesis, Universidade de São Paulo.

Canuto, S., Gonc¸alves, L. F., Salles, T., and Gonçalves, M. A. (2013). Um estudo sobre meta-atributos para classificação automática de texto.

Canuto, S., Gonçalves, M. A., and Benevenuto, F. (2016). Exploiting new sentimentbased meta-level features for effective sentiment analysis. In Proceedings of the ninth

Chen, Y., Zhou, Y., Zhu, S., and Xu, H. (2012). Detecting offensive language in social media to protect adolescent online safety. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, pages 71–80. IEEE.

de Pelle, R. P. and Moreira, V. P. (2017). Offensive comments in the brazilian web: a dataset and baseline results. In 6th Brazilian Workshop on Social Network Analysis and Mining (BraSNAM). to appear.

Nakamura, F. G. et al. (2017). Uma abordagem para identificar e monitorar haters em redes sociais online.

Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., and Chang, Y. (2016). Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web, pages 145–153. International World Wide Web Conferences Steering Committee.

Nockleby, J. T. (2000). Hate speech. Encyclopedia of the American constitution, 3:1277–79.

Schmidt, A. and Wiegand, M. (2017). A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pages 1–10.

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1–47.

Sood, S. O., Antin, J., and Churchill, E. (2012a). Using crowdsourcing to improve profanity detection. In 2012 AAAI Spring Symposium Series.

Sood, S. O., Churchill, E. F., and Antin, J. (2012b). Automatic identification of personal insults on social news sites. Journal of the American Society for Information Science and Technology, 63(2):270–285.
Published
2019-04-10
LIMA, Cleiton; DAL BIANCO, Guilherme. Feature extraction for hate speech identification in documents. In: REGIONAL DATABASE SCHOOL (ERBD), 15. , 2019, Chapecó. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 61-70. ISSN 2595-413X. DOI: https://doi.org/10.5753/erbd.2019.8479.