Skip to main content

Aggressive Language Detection Using VGCN-BERT for Spanish Texts

  • Conference paper
  • First Online:
Book cover Intelligent Systems (BRACIS 2021)

Abstract

The increasing influence from users in social media has made that aggressive content disseminates over the internet. To tackle this problem, recent advances in Aggressive Language Detection have demonstrated a good performance of Deep Learning techniques. Recently Transformer based architectures such as Bidirectional Encoder Representations from Transformer (BERT) outperformed previous aggressive text detection baselines. However, most of the Transformers-based approaches are unable to properly capture global information such as language vocabulary. Thus, in this work, we focus on aggressive content detection using the combination of Vocabulary Graph Convolutional Network (VGCN) to capture global information and BERT to model local information. This combined approach called VGCN-BERT allows us to improve the feature level representation in Spanish aggressive language detection. Our experiments were performed on a benchmark called MEX-A3T aggressiveness dataset which is composed of aggressive and non-aggressive Tweets written in the Mexican Spanish variant. We report 86.46% in terms of F1-score using this VGCN-BERT approach which allows us to obtain comparable results with the current state-of-the-art, ensemble BERT, so as to detect aggressive content regarding the track MEX-A3T 2020.

This research was supported by National Fund for Scientific and Technological Development and Innovation (Fondecyt-Perú) within the framework of the “Project of 50 E038-2019-01-BM Improvement and Expansion of the Services of the National System of Science, Technology and Technological Innovation” [Grant 028-2019-FONDECYT-BM-INC.INV].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    MEX-A3T: Authorship and aggressiveness analysis in the Mexican Spanish case study.

  2. 2.

    Association for Computer Language conference.

  3. 3.

    mex-a3t site: https://mexa3t.wixsite.com/home.

  4. 4.

    II Trolling Aggressive and Cyberbylling workshop.

  5. 5.

    Iberian Languages Evaluation Forum 2020.

  6. 6.

    BETO is a pre-trained BERT on a large Spanish corpus [9].

  7. 7.

    http://www.nltk.org/api/nltk.tokenize.html.

  8. 8.

    https://scikit-learn.org/.../sklearn.util..._weight.compute_class_weight.html.

References

  1. Álvarez-Carmona, M.Á., et al.: Overview of MEX-A3T at IberEval 2018: authorship and aggressiveness analysis in Mexican Spanish tweets. In: Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL), Seville, Spain, vol. 6 (2018)

    Google Scholar 

  2. Aragón, M.E., Álvarez-Carmona, M.Á., Montes-y Gómez, M., Escalante, H.J., Villasenor-Pineda, L., Moctezuma, D.: Overview of MEX-A3T at IberLEF 2019: authorship and aggressiveness analysis in Mexican Spanish tweets. In: Notebook Papers of 1st SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Bilbao, Spain (2019)

    Google Scholar 

  3. Aragón, M., et al.: Overview of MEX-A3T at IberLEF 2020: fake news and aggressiveness analysis in Mexican Spanish. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)

    Google Scholar 

  4. Arce-Cardenasa, S., Fajardo-Delgadoa, D., Álvarez-Carmonab, M.Á.: TecNM at MEX-A3T 2020: Fake news and aggressiveness analysis in Mexican Spanish (2020)

    Google Scholar 

  5. Plaza-del Arco, F.M., Molina-González, M.D., Ureña-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, 114120 (2021)

    Google Scholar 

  6. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  7. Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018)

  8. Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL, pp. 31–40 (2009)

    Google Scholar 

  9. Canete, J., Chaperon, G., Fuentes, R., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)

    Google Scholar 

  10. Casavantes, M., López, R., González, L.: UACh at MEX-A3T 2020: detecting aggressive tweets by incorporating author and message context. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)

    Google Scholar 

  11. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh International AAAI Conference on Web and Social Media (2017)

    Google Scholar 

  12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  13. Díaz-Torres, M.J., Morán-Méndez, P.A., Villasenor-Pineda, L., Montes, M., Aguilera, J., Meneses-Lerín, L.: Automatic detection of offensive language in social media: defining linguistic criteria to build a Mexican Spanish dataset. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 132–136 (2020)

    Google Scholar 

  14. Garrido-Espinosa, M., Rosales-Pérez, A., López-Monroy, A.: GRU with author profiling information to detect aggressiveness. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)

    Google Scholar 

  15. Guzman-Silverio, M., Balderas-Paredes, A., López-Monroy, A.: Transformers and data augmentation for aggressiveness detection in Mexican Spanish. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)

    Google Scholar 

  16. Jeong, C., Jang, S., Park, E., Choi, S.: A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics 124(3), 1907–1922 (2020)

    Article  Google Scholar 

  17. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  18. Kumar, R., et al.: Proceedings of the second workshop on trolling, aggression and cyberbullying (2020)

    Google Scholar 

  19. Lever, J., Krzywinski, M., Altman, N.: Classification evaluation (2016)

    Google Scholar 

  20. Lu, Z., Du, P., Nie, J.-Y., et al.: VGCN-BERT: augmenting BERT with graph embedding for text classification. In: Jose, J.M. (ed.) ECIR 2020. LNCS, vol. 12035, pp. 369–382. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_25

    Chapter  Google Scholar 

  21. Park, J.H., Fung, P.: One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:1706.01206 (2017)

  22. Samghabadi, N.S., Patwa, P., Srinivas, P., Mukherjee, P., Das, A., Solorio, T.: Aggression and misogyny detection using BERT: a multi-task approach. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 126–131 (2020)

    Google Scholar 

  23. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10 (2017)

    Google Scholar 

  24. Shang, J., Ma, T., Xiao, C., Sun, J.: Pre-training of graph augmented transformers for medication recommendation. arXiv preprint arXiv:1906.00346 (2019)

  25. Tanase, M.A., Zaharia, G.E., Cercel, D.C., Dascalu, M.: Detecting aggressiveness in Mexican Spanish social media content by fine-tuning transformer-based models (2020)

    Google Scholar 

  26. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  27. Villatoro-Tello, E., Ramırez-de-la Rosa, G., Kumar, S., Parida, S., Motlicek, P.: Idiap and UAM participation at MEX-A3T evaluation campaign. In: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain (2020)

    Google Scholar 

  28. Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: a typology of abusive language detection subtasks. arXiv preprint arXiv:1705.09899 (2017)

  29. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93 (2016)

    Google Scholar 

  30. Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 7370–7377 (2019)

    Google Scholar 

  31. Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: SemEval-2019 task 6: identifying and categorizing offensive language in social media (OffensEval). arXiv preprint arXiv:1903.08983 (2019)

  32. Zhang, Z., Luo, L.: Hate speech detection: a solved problem? the challenging case of long tail on twitter. Semant. Web 10(5), 925–945 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Errol Mamani-Condori .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mamani-Condori, E., Ochoa-Luna, J. (2021). Aggressive Language Detection Using VGCN-BERT for Spanish Texts. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91699-2_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91698-5

  • Online ISBN: 978-3-030-91699-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics