short-paper

Hate speech detection using brazilian imageboards

Authors:
Gabriel Nascimento

CEFET/RJ, Rio de Janeiro, RJ

CEFET/RJ, Rio de Janeiro, RJ
View Profile

,
Flavio Carvalho

CEFET/RJ, Rio de Janeiro, RJ

CEFET/RJ, Rio de Janeiro, RJ
View Profile

,
Alexandre Martins da Cunha

CEFET/RJ - UFF, Rio de Janeiro, RJ

CEFET/RJ - UFF, Rio de Janeiro, RJ
View Profile

,
Carlos Roberto Viana

CEFET/RJ, Rio de Janeiro, RJ

CEFET/RJ, Rio de Janeiro, RJ
View Profile

,
Gustavo Paiva Guedes

CEFET/RJ, Rio de Janeiro, RJ

CEFET/RJ, Rio de Janeiro, RJ
View Profile

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the WebOctober 2019Pages 325–328https://doi.org/10.1145/3323503.3360619

Published:29 October 2019Publication History

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web

Pages 325–328

ABSTRACT

With the changes in human interaction prompted by the development of communications platforms over the internet, hate speech and offensive language emerged as a contemporary problem. Social networks allow users with different opinions and backgrounds to interact without direct eye-to-eye contact. It brings a sense of safety to promote hate speech, which is even more significant in anonymous environments. There are sites called imageboards, composed of different boards aggregating different topics. On some boards, anonymous users widely promote hate speech. However, only a few works in literature have focused on hate speech in imageboards content. This work aims to classify Brazilian Portuguese texts to detect hate speech, using data from the Brazilian 55chan imageboard to build a dataset with hate speech content. Three classifiers were trained to hate speech binary classification. The Linear Support Vector Classifier achieved the best result with 0.955 of F1-score.

References

Rakesh Agrawal, Roberto Bayardo, and Ramakrishnan Srikant. 2000. Athena: Mining-based interactive management of text databases. In International Conference on Extending Database Technology. Springer, Berlin, Heidelberg, 365--379.Google ScholarCross Ref
Ika Alfina, Rio Mulia, Mohamad Ivan Fanany, and Yudo Ekanata. 2017. Hate speech detection in the Indonesian language: A dataset and preliminary study. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, Bali, Indonesia, 233--238. Google ScholarCross Ref
Thais G Almeida, Bruno À Souza, Fabíola G Nakamura, and Eduardo F Nakamura. 2017. Detecting Hate, Offensive, and Regular Speech in Short Comments. In Proceedings of the 23rd Brazilian Symposium on Multimedia and the Web. SBC, Gramado, Brazil, 225--228.Google ScholarDigital Library
Carlos Argueta, Fernando H Calderon, and Yi-Shin Chen. 2016. Multilingual emotion classifier using unsupervised pattern extraction from microblog data. Intelligent Data Analysis 20, 6 (2016), 1477--1502.Google ScholarCross Ref
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.Google Scholar
Flavio Carvalho, Rafael Guimarães Rodrigues, Gabriel dos Santos, Pedro Cruz, Lilian Ferrari, and Gustavo Paiva Guedes. 2019. Evaluating the 2015 Brazilian Portuguese LIWC Lexicon with sentiment analysis in social networks. In CSBC 2019 - 8th BraSNAM. SBC, Belém, Brazil, 24--34.Google ScholarCross Ref
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.Google Scholar
Douglas Crockford. 2006. The application/json media type for javascript object notation (json).Google Scholar
Fernando Fontanella. 2010. Nós somos anonymous: anonimato, trolls e a subcultura dos imageboards.Google Scholar
Florian Heimerl, Steffen Lohmann, Simon Lange, and Thomas Ertl. 2014. Word cloud explorer: Text analytics based on word clouds. In HICSS '14: Proceedings of the 2014 47th Hawaii International Conference on System Sciences. IEEE Computer Society, Washington, DC, USA, 1833--1842.Google ScholarDigital Library
Gabriel Emile Hine, Jeremiah Onaolapo, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Riginos Samaras, Gianluca Stringhini, and Jeremy Blackburn. 2017. Kek, cucks, and god emperor Trump: A measurement study of 4chan's politically incorrect forum and its effects on the web. In International AAAI Conference on Web and Social Media. AAAI, North America.Google Scholar
Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text.Google Scholar
Dillon Ludemann. 2018. /pol/emics: Ambiguity, scales, and digital discourse on 4chan. Discourse, Context & Media 24 (2018), 92--98.Google ScholarCross Ref
Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text categorization, Vol. 752. Citeseer, California, 41--48.Google Scholar
Prem Melville, Wojciech Gryc, and Richard D Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Paris, France, 1275--1284.Google ScholarDigital Library
Alexandros Mittos, Savvas Zannettou, Jeremy Blackburn, and Emiliano De Cristofaro. 2019. "And We Will Fight For Our Race!'" A Measurement Study of Genetic Testing Conversations on Reddit and 4chan. (2019).Google Scholar
Angela Nagle. 2017. Kill all normies: Online culture wars from 4chan and Tumblr to Trump and the alt-right. John Hunt Publishing, UK.Google Scholar
Thais Mayumi Oshiro, Pedro Santoro Perez, and José Augusto Baranauskas. 2012. How many trees in a random forest?. In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, Berlin, Germany, 154--168.Google ScholarDigital Library
James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report. University of Texas, Austin, TX, EUA.Google Scholar
Juan Ramos et al. 2003. Using TF-IDF to determine word relevance in document queries., 133--142 pages.Google Scholar
Julio CS Reis, Pollyanna Gonçalves, Matheus Araújo, Adriano CM Pereira, and Fabrıcio Benevenuto. 2015. Uma abordagem multilıngue para análise de sentimentos. In IV Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2015). SBC, Porto Alegre, RS, Brasil.Google ScholarCross Ref
Axel Rodríguez, Carlos Argueta, and Yi-Ling Chen. 2019. Automatic Detection of Hate Speech on Facebook Using Sentiment and Emotion Analysis. In 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, Okinawa, Japan, 169--174.Google Scholar
Anna Schmidt and Michael Wiegand. 2017. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, 1--10.Google ScholarCross Ref
Anna Stavrianou, Periklis Andritsos, and Nicolas Nicoloyannis. 2007. Overview and semantic issues of text mining. ACM Sigmod Record 36, 3 (2007), 23--34.Google ScholarDigital Library
John Suler. 2004. The Online Disinhibition Effect. CyberPsychology & Behavior 7, 3 (June 2004), 321--326. Google ScholarCross Ref
H. Watanabe, M. Bouazizi, and T. Ohtsuki. 2018. Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access 6 (2018), 13825--13835.Google ScholarCross Ref
Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringini, and Jeremy Blackburn. 2018. What is gab: A bastion of free speech or an alt-right echo chamber. In Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1007--1014.Google ScholarDigital Library
Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtelris, Ilias Leontiadis, Michael Sirivianos, Gianluca Stringhini, and Jeremy Blackburn. 2017. The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources. In Proceedings of the 2017 Internet Measurement Conference (IMC '17). ACM, New York, NY, USA, 405--417. Google ScholarDigital Library

Index Terms

Hate speech detection using brazilian imageboards
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
2. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing theory, concepts and paradigms
      1. Social media

Recommendations

Hate Speech Detection in Roman Urdu
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

Hate speech is a specific type of controversial content that is widely legislated as a crime that must be identified and blocked. However, due to the sheer volume and velocity of the Twitter data stream, hate speech detection cannot be performed ...
Read More
Hate Speech Detection Using Static BERT Embeddings
Big Data Analytics
Abstract
With increasing popularity of social media platforms hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics, such as gender, religion or ethnicity to spread violence. Earlier ...
Read More
Accelerating automatic hate speech detection using parallelized ensemble learning models
Abstract
With increasing number of social media users and online engagement, it is essential to study hate speech propagation on social media platforms (SMPs). Automatic hate speech detection on social media is of utmost importance as hate speech can ...
Highlights
- Parallelized algorithms for accelerating the process of hate speech detection
- The phenomenon of hate speech on social media during the recent events is explored
- First attempt to address hate speech propagation during the farmers’ ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web
October 2019
537 pages
ISBN:9781450367639
DOI:10.1145/3323503
General Chairs:
Joel dos Santos
CEFET/RJ
,
Débora Christina Muchaluat Saade
UFF
,
Maria da Graça C. Pimentel
University of Sao Paulo, Brazil
,
Alessandra Alaniz Macedo
University of Sao Paulo, Brazil
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hate speech detection
imageboards
text mining
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate270of873submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 228
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hate speech detection using brazilian imageboards

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hate Speech Detection in Roman Urdu

Hate Speech Detection Using Static BERT Embeddings

Accelerating automatic hate speech detection using parallelized ensemble learning models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Hate speech detection using brazilian imageboards

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hate Speech Detection in Roman Urdu

Hate Speech Detection Using Static BERT Embeddings

Accelerating automatic hate speech detection using parallelized ensemble learning models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media