Identifying Xenophobia in Twitter Posts Using Support Vector Machine with TF/IDF Strategy

Alisson Rodrigo Santana dos Santos; Cleyton Mário de Oliveira Rodrigues; Henning Barly Summer de Melo

Alisson Rodrigo Santana dos Santos Escola Politécnica de Pernambuco
Cleyton Mário de Oliveira Rodrigues Escola Politécnica de Pernambuco
Henning Barly Summer de Melo Escola Politécnica de Pernambuco

Resumo

Context: Xenophobia is the fear of foreign groups. Nevertheless, it is understood that this phenomenon emcompasses something much broader, as it brings to light not only fear, but also rejection or hostility towards different ethnic groups. Although it is not a contemporary problem, recent factors such as economic and humanitarian crises have shown that the problem is growing. Problem: Twitter is one of the most used social networks for data mining studies, due to its large number of posts. These singularities make the platform conducive to the proliferation of hate speech. Solution: The present research aims to develop a tweet classifier system for xenophobic messages. IS theory: This work was conceived under the aegis of Organizational Learning Theory. In particular, the Support Vector Machines strategy was used together with the TF-IDF statistical technique, in order to engineer a predictive model for learning potential patterns within the collected data. Method: The research conducted in this study is quantitative, organized through the following methodological procedures: (i) data collection, (ii) controlled laboratory experiments, and (iii) construction of the classifier. Summary of Results: Among the results for the developed classifier, the one with the best performance was the SVM with Kernel Sigmoid, with an accuracy of 90%. Thus, the research results are encouraging for the identification of xenophobia in social media. Contribution and Impact in the IS area: As contributions, in addition to the classification system, we also have the creation of a database on Xenophobia, something that, as far as is known, does not exist in the Brazilian context.

Palavras-chave: Xenophobia, Machine Learning, Text Mining, Twitter Post

Referências

Akiko Aizawa. 2003. An information-theoretic perspective of tf–idf measures. Information Processing & Management 39, 1 (2003), 45–65.

Nofa aulia and Indra Budi. 2019. Hate speech detection on Indonesian long text documents using machine learning approach. In Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence. 164–169.

Diego Bonesso. 2013. Estimação dos parâmetros do kernel em um classificador svm na classificação de imagens hiperespectrais em uma abordagem multiclasse. (2013).

Clodis Boscarioli, Renata Mendes de Araujo, and Rita Suzana Maciel. 2017. I GranDSI-BR: Grand Research Challenges in Information Systems in Brazil 2016-2026. Brazilian Computer Society (SBC).

Brasil. 1997. Lei nº 9.459, de 13 de maio de 1997.Diário Oficial da República Federativa do Brasil (May 1997). [link]

Cleide Carvalho. 2020. Racismo e xenofobia on-line se banalizam, e número de denúncias no Brasil diminui. [link].

L Cavalcanti, T Oliveira, M Macêdo, and L. PEREDA. 2019. Imigração e Refúgio no Brasil. A inserção do imigrante, solicitante de refúgio e refugiado no mercado de trabalho formal. Observatório das Migrações Internacionais; Ministério da Justiça e Segurança pública/ Conselho Nacional de Imigração e Coordenação Geral de Imigração Laboral.(2019).

Mariza Miola Dosciatti, Lohann Paterno Coutinho Ferreira, and Emerson Cabrera Paraiso. 2013. Identificando emoçoes em textos em português do brasil usando máquina de vetores de suporte em soluçao multiclasse. ENIAC-Encontro Nacional de Inteligência Artificial e Computacional. Fortaleza, Brasil (2013).

Nelson F. F. Ebecken, Maria Celia S. Lopes, and Myrian Costa. 2003. Mineração de Textos. (2003), 337–370.

Paulo Daniel Farah. 2017. Combates à xenofobia, ao racismo e à intolerância. Revista USP114(2017), 11–30.

Cecília De La Garza. 2011. Xenofobia. Laboreal 7, Nº2 (2011).

Ana Carolina Lorena and André Carlos Ponce de Leon Ferreira de Carvalho. 2003. Introduçaoas máquinas de vetores suporte. Relatório Técnico do Instituto de Ciências Matemáticas e de Computaçao (USP/Sao Carlos) 192 (2003), 11.

Luana Clara Garcia de Medeiroset al. 2018. Imigração e eficácia dos direitos fundamentais quando à xenofobia: Análise do caso brasileiro. (2018).

Onu. 2019. Estudo da ONU aponta aumento da população de migrantes internacionais | As Nações Unidas no Brasil. [link].

Juan Carlos Pereira Kohatsu, Lara Quijano-Sánchez, Federico Liberatore, and Miguel Camacho-Collados. 2019. Detecting and monitoring hate speech in Twitter. Sensors 19, 21 (2019), 4654.

Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman. 2014. Mining of Massive Datasets. https://doi.org/10.1017/CBO9781139058452

Raúl R Romero Vega. 2021. Detecting Xenophobic Hate Speech in Spanish Tweets Against Venezuelan Immigrants in Ecuador Using Natural Language Processing. In Applied Technologies: Second International Conference, ICAT 2020, Quito, Ecuador, December 2–4, 2020, Proceedings, Vol. 1388. Springer Nature, 312.

Matheus Moreira Silva. 2017. Mineração de dados no Twitter: uma ferramenta prática para extração e análise dos resultados.(2017).

Vladimir Vapnik. 1999. The nature of statistical learning theory. Springer science & business media.

Luciana Werner. 2020. Ódio e preconceito contra asiáticos crescem no Brasil e nos EUA - Projeto Colabora. https://projetocolabora.com.br/ods3/cresce-o-odio-contra-asiaticos/. (Accessed on 12/06/2021).