skip to main content
10.1145/3323503.3360619acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

Hate speech detection using brazilian imageboards

Published:29 October 2019Publication History

ABSTRACT

With the changes in human interaction prompted by the development of communications platforms over the internet, hate speech and offensive language emerged as a contemporary problem. Social networks allow users with different opinions and backgrounds to interact without direct eye-to-eye contact. It brings a sense of safety to promote hate speech, which is even more significant in anonymous environments. There are sites called imageboards, composed of different boards aggregating different topics. On some boards, anonymous users widely promote hate speech. However, only a few works in literature have focused on hate speech in imageboards content. This work aims to classify Brazilian Portuguese texts to detect hate speech, using data from the Brazilian 55chan imageboard to build a dataset with hate speech content. Three classifiers were trained to hate speech binary classification. The Linear Support Vector Classifier achieved the best result with 0.955 of F1-score.

References

  1. Rakesh Agrawal, Roberto Bayardo, and Ramakrishnan Srikant. 2000. Athena: Mining-based interactive management of text databases. In International Conference on Extending Database Technology. Springer, Berlin, Heidelberg, 365--379.Google ScholarGoogle ScholarCross RefCross Ref
  2. Ika Alfina, Rio Mulia, Mohamad Ivan Fanany, and Yudo Ekanata. 2017. Hate speech detection in the Indonesian language: A dataset and preliminary study. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, Bali, Indonesia, 233--238. Google ScholarGoogle ScholarCross RefCross Ref
  3. Thais G Almeida, Bruno À Souza, Fabíola G Nakamura, and Eduardo F Nakamura. 2017. Detecting Hate, Offensive, and Regular Speech in Short Comments. In Proceedings of the 23rd Brazilian Symposium on Multimedia and the Web. SBC, Gramado, Brazil, 225--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Carlos Argueta, Fernando H Calderon, and Yi-Shin Chen. 2016. Multilingual emotion classifier using unsupervised pattern extraction from microblog data. Intelligent Data Analysis 20, 6 (2016), 1477--1502.Google ScholarGoogle ScholarCross RefCross Ref
  5. Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.Google ScholarGoogle Scholar
  6. Flavio Carvalho, Rafael Guimarães Rodrigues, Gabriel dos Santos, Pedro Cruz, Lilian Ferrari, and Gustavo Paiva Guedes. 2019. Evaluating the 2015 Brazilian Portuguese LIWC Lexicon with sentiment analysis in social networks. In CSBC 2019 - 8th BraSNAM. SBC, Belém, Brazil, 24--34.Google ScholarGoogle ScholarCross RefCross Ref
  7. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.Google ScholarGoogle Scholar
  8. Douglas Crockford. 2006. The application/json media type for javascript object notation (json).Google ScholarGoogle Scholar
  9. Fernando Fontanella. 2010. Nós somos anonymous: anonimato, trolls e a subcultura dos imageboards.Google ScholarGoogle Scholar
  10. Florian Heimerl, Steffen Lohmann, Simon Lange, and Thomas Ertl. 2014. Word cloud explorer: Text analytics based on word clouds. In HICSS '14: Proceedings of the 2014 47th Hawaii International Conference on System Sciences. IEEE Computer Society, Washington, DC, USA, 1833--1842.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gabriel Emile Hine, Jeremiah Onaolapo, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Riginos Samaras, Gianluca Stringhini, and Jeremy Blackburn. 2017. Kek, cucks, and god emperor Trump: A measurement study of 4chan's politically incorrect forum and its effects on the web. In International AAAI Conference on Web and Social Media. AAAI, North America.Google ScholarGoogle Scholar
  12. Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text.Google ScholarGoogle Scholar
  13. Dillon Ludemann. 2018. /pol/emics: Ambiguity, scales, and digital discourse on 4chan. Discourse, Context & Media 24 (2018), 92--98.Google ScholarGoogle ScholarCross RefCross Ref
  14. Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text categorization, Vol. 752. Citeseer, California, 41--48.Google ScholarGoogle Scholar
  15. Prem Melville, Wojciech Gryc, and Richard D Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Paris, France, 1275--1284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alexandros Mittos, Savvas Zannettou, Jeremy Blackburn, and Emiliano De Cristofaro. 2019. "And We Will Fight For Our Race!'" A Measurement Study of Genetic Testing Conversations on Reddit and 4chan. (2019).Google ScholarGoogle Scholar
  17. Angela Nagle. 2017. Kill all normies: Online culture wars from 4chan and Tumblr to Trump and the alt-right. John Hunt Publishing, UK.Google ScholarGoogle Scholar
  18. Thais Mayumi Oshiro, Pedro Santoro Perez, and José Augusto Baranauskas. 2012. How many trees in a random forest?. In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, Berlin, Germany, 154--168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report. University of Texas, Austin, TX, EUA.Google ScholarGoogle Scholar
  20. Juan Ramos et al. 2003. Using TF-IDF to determine word relevance in document queries., 133--142 pages.Google ScholarGoogle Scholar
  21. Julio CS Reis, Pollyanna Gonçalves, Matheus Araújo, Adriano CM Pereira, and Fabrıcio Benevenuto. 2015. Uma abordagem multilıngue para análise de sentimentos. In IV Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2015). SBC, Porto Alegre, RS, Brasil.Google ScholarGoogle ScholarCross RefCross Ref
  22. Axel Rodríguez, Carlos Argueta, and Yi-Ling Chen. 2019. Automatic Detection of Hate Speech on Facebook Using Sentiment and Emotion Analysis. In 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, Okinawa, Japan, 169--174.Google ScholarGoogle Scholar
  23. Anna Schmidt and Michael Wiegand. 2017. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  24. Anna Stavrianou, Periklis Andritsos, and Nicolas Nicoloyannis. 2007. Overview and semantic issues of text mining. ACM Sigmod Record 36, 3 (2007), 23--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. John Suler. 2004. The Online Disinhibition Effect. CyberPsychology & Behavior 7, 3 (June 2004), 321--326. Google ScholarGoogle ScholarCross RefCross Ref
  26. H. Watanabe, M. Bouazizi, and T. Ohtsuki. 2018. Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access 6 (2018), 13825--13835.Google ScholarGoogle ScholarCross RefCross Ref
  27. Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringini, and Jeremy Blackburn. 2018. What is gab: A bastion of free speech or an alt-right echo chamber. In Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1007--1014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtelris, Ilias Leontiadis, Michael Sirivianos, Gianluca Stringhini, and Jeremy Blackburn. 2017. The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources. In Proceedings of the 2017 Internet Measurement Conference (IMC '17). ACM, New York, NY, USA, 405--417. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hate speech detection using brazilian imageboards

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web
        October 2019
        537 pages
        ISBN:9781450367639
        DOI:10.1145/3323503

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 October 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate270of873submissions,31%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader