Aggressive Language Detection Using VGCN-BERT for Spanish Texts

Mamani-Condori, Errol; Ochoa-Luna, José

doi:10.1007/978-3-030-91699-2_25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

988 Accesses
1 Citations

Abstract

The increasing influence from users in social media has made that aggressive content disseminates over the internet. To tackle this problem, recent advances in Aggressive Language Detection have demonstrated a good performance of Deep Learning techniques. Recently Transformer based architectures such as Bidirectional Encoder Representations from Transformer (BERT) outperformed previous aggressive text detection baselines. However, most of the Transformers-based approaches are unable to properly capture global information such as language vocabulary. Thus, in this work, we focus on aggressive content detection using the combination of Vocabulary Graph Convolutional Network (VGCN) to capture global information and BERT to model local information. This combined approach called VGCN-BERT allows us to improve the feature level representation in Spanish aggressive language detection. Our experiments were performed on a benchmark called MEX-A3T aggressiveness dataset which is composed of aggressive and non-aggressive Tweets written in the Mexican Spanish variant. We report 86.46% in terms of F1-score using this VGCN-BERT approach which allows us to obtain comparable results with the current state-of-the-art, ensemble BERT, so as to detect aggressive content regarding the track MEX-A3T 2020.

This research was supported by National Fund for Scientific and Technological Development and Innovation (Fondecyt-Perú) within the framework of the “Project of 50 E038-2019-01-BM Improvement and Expansion of the Services of the National System of Science, Technology and Technological Innovation” [Grant 028-2019-FONDECYT-BM-INC.INV].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
MEX-A3T: Authorship and aggressiveness analysis in the Mexican Spanish case study.
2.
Association for Computer Language conference.
3.
mex-a3t site: https://mexa3t.wixsite.com/home.
4.
II Trolling Aggressive and Cyberbylling workshop.
5.
Iberian Languages Evaluation Forum 2020.
6.
BETO is a pre-trained BERT on a large Spanish corpus [9].
7.
http://www.nltk.org/api/nltk.tokenize.html.
8.
https://scikit-learn.org/.../sklearn.util..._weight.compute_class_weight.html.

References

Álvarez-Carmona, M.Á., et al.: Overview of MEX-A3T at IberEval 2018: authorship and aggressiveness analysis in Mexican Spanish tweets. In: Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL), Seville, Spain, vol. 6 (2018)
Google Scholar
Aragón, M.E., Álvarez-Carmona, M.Á., Montes-y Gómez, M., Escalante, H.J., Villasenor-Pineda, L., Moctezuma, D.: Overview of MEX-A3T at IberLEF 2019: authorship and aggressiveness analysis in Mexican Spanish tweets. In: Notebook Papers of 1st SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Bilbao, Spain (2019)
Google Scholar
Aragón, M., et al.: Overview of MEX-A3T at IberLEF 2020: fake news and aggressiveness analysis in Mexican Spanish. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)
Google Scholar
Arce-Cardenasa, S., Fajardo-Delgadoa, D., Álvarez-Carmonab, M.Á.: TecNM at MEX-A3T 2020: Fake news and aggressiveness analysis in Mexican Spanish (2020)
Google Scholar
Plaza-del Arco, F.M., Molina-González, M.D., Ureña-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, 114120 (2021)
Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018)
Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL, pp. 31–40 (2009)
Google Scholar
Canete, J., Chaperon, G., Fuentes, R., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)
Google Scholar
Casavantes, M., López, R., González, L.: UACh at MEX-A3T 2020: detecting aggressive tweets by incorporating author and message context. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)
Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh International AAAI Conference on Web and Social Media (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Díaz-Torres, M.J., Morán-Méndez, P.A., Villasenor-Pineda, L., Montes, M., Aguilera, J., Meneses-Lerín, L.: Automatic detection of offensive language in social media: defining linguistic criteria to build a Mexican Spanish dataset. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 132–136 (2020)
Google Scholar
Garrido-Espinosa, M., Rosales-Pérez, A., López-Monroy, A.: GRU with author profiling information to detect aggressiveness. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)
Google Scholar
Guzman-Silverio, M., Balderas-Paredes, A., López-Monroy, A.: Transformers and data augmentation for aggressiveness detection in Mexican Spanish. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)
Google Scholar
Jeong, C., Jang, S., Park, E., Choi, S.: A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics 124(3), 1907–1922 (2020)
Article Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Kumar, R., et al.: Proceedings of the second workshop on trolling, aggression and cyberbullying (2020)
Google Scholar
Lever, J., Krzywinski, M., Altman, N.: Classification evaluation (2016)
Google Scholar
Lu, Z., Du, P., Nie, J.-Y., et al.: VGCN-BERT: augmenting BERT with graph embedding for text classification. In: Jose, J.M. (ed.) ECIR 2020. LNCS, vol. 12035, pp. 369–382. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_25
Chapter Google Scholar
Park, J.H., Fung, P.: One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:1706.01206 (2017)
Samghabadi, N.S., Patwa, P., Srinivas, P., Mukherjee, P., Das, A., Solorio, T.: Aggression and misogyny detection using BERT: a multi-task approach. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 126–131 (2020)
Google Scholar
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10 (2017)
Google Scholar
Shang, J., Ma, T., Xiao, C., Sun, J.: Pre-training of graph augmented transformers for medication recommendation. arXiv preprint arXiv:1906.00346 (2019)
Tanase, M.A., Zaharia, G.E., Cercel, D.C., Dascalu, M.: Detecting aggressiveness in Mexican Spanish social media content by fine-tuning transformer-based models (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Villatoro-Tello, E., Ramırez-de-la Rosa, G., Kumar, S., Parida, S., Motlicek, P.: Idiap and UAM participation at MEX-A3T evaluation campaign. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)
Google Scholar
Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: a typology of abusive language detection subtasks. arXiv preprint arXiv:1705.09899 (2017)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93 (2016)
Google Scholar
Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 7370–7377 (2019)
Google Scholar
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: SemEval-2019 task 6: identifying and categorizing offensive language in social media (OffensEval). arXiv preprint arXiv:1903.08983 (2019)
Zhang, Z., Luo, L.: Hate speech detection: a solved problem? the challenging case of long tail on twitter. Semant. Web 10(5), 925–945 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Universidad Católica San Pablo, Arequipa, Peru
Errol Mamani-Condori & José Ochoa-Luna

Authors

Errol Mamani-Condori
View author publications
You can also search for this author in PubMed Google Scholar
José Ochoa-Luna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Errol Mamani-Condori .

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mamani-Condori, E., Ochoa-Luna, J. (2021). Aggressive Language Detection Using VGCN-BERT for Spanish Texts. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-91699-2_25
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics