How to identify Cyberbullying with Machine Learning

M. L. Fujimoto; M. Gaseta; S. O. Rezende; R. A. F. Romero

doi:10.5753/kdmile.2024.244087

M. L. Fujimoto USP
M. Gaseta USP
S. O. Rezende USP
R. A. F. Romero USP

DOI: https://doi.org/10.5753/kdmile.2024.244087

Resumo

Cyberbullying is a form of bullying that has emerged and is a concerning problem with the exponential increase of social media users. Social networks provide a suitable environment for those bullies to attack and cause serious psychological problems in their victims. To mitigate these issues, proactive measures are essential to detect and prevent cyberbullying before disseminating harmful content. With this concern in mind, this article proposes an approach to combine TF-IDF with machine learning models to automatically identify cyberbullying. These models are evaluated using metrics such as accuracy and F1-score to identify and classify cyberbullying instances. The research aims to contribute to the development of automated systems capable of preemptively addressing cyberbullying on social media platforms.

Palavras-chave: machine learning, cyberbullying, natural language processing

Referências

Almomani, A., Nahar, K., Alauthman, M., Al-Betar, M. A., Yaseen, Q., and Gupta, B. B. Image cyberbullying detection and recognition using transfer deep machine learning. International Journal of Cognitive Computing in Engineering vol. 5, pp. 14 – 26, 2024.

Badjatiya, P., Gupta, S., Gupta, M., and Varma, V. Deep learning for hate speech detection in tweets. In Proceedings of the 26th international conference on World Wide Web companion. pp. 759–760, 2017.

Balayn, A., Yang, J., Szlavik, Z., and Bozzon, A. Automatic identification of harmful, aggressive, abusive, and offensive language on the web: A survey of technical biases informed by psychology literature. Trans. Soc. Comput. 4 (3), 2021.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16 (1): 321–357, 2002.

Dadvar, M. and Eckert, K. Cyberbullying detection in social networks using deep learning based models. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 12393 LNCS, pp. 245 – 255, 2020.

Davidson, T., Warmsley, D., Macy, M., and Weber, I. Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media. Vol. 11. pp. 512–515, 2017.

Founta, A., Djouvas, C., Chatzakou, D., Leontiadis, I., Blackburn, J., Stringhini, G., Vakali, A., Sirivianos, M., and Kourtellis, N. Large scale crowdsourcing and characterization of twitter abusive behavior. In Proceedings of the international AAAI conference on web and social media. Vol. 12, 2018.

Haidar, B., Chamoun, M., and Serhrouchni, A. Arabic cyberbullying detection: Enhancing performance by using ensemble machine learning. 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2019.

Kompally, P., Sethuraman, S. C., Walczak, S., Johnson, S., and Cruz, M. V. Malang: A decentralized deep learning approach for detecting abusive textual content. Applied Sciences (Switzerland) 11 (18), 2021.

Kumar, A. and Sachdeva, N. Cyberbullying detection on social multimedia using soft computing techniques: a meta-analysis. Multimedia Tools and Applications 78 (17): 23973–24010, 2019.

Salawu, S., Lumsden, J., and He, Y. A large-scale English multi-label Twitter dataset for cyberbullying and online abuse detection. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), A. Mostafazadeh Davani, D. Kiela, M. Lambert, B. Vidgen, V. Prabhakaran, and Z. Waseem (Eds.). Association for Computational Linguistics, Online, pp. 146–156, 2021.

Samatha, B., Karyemsetty, N., Kumar, D. S., Rao, D. K., Mani, G., and Syamsundararao, T. Analysis of a multichannel learning mechanism for speech detection in social networks. Proceedings of the International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics, ICIITCEE , 2023.

Shome, D. and Kar, T. Conoffense: Multi-modal multitask contrastive learning for offensive content identification. In 2021 IEEE International Conference on Big Data (Big Data). pp. 4524–4529, 2021.

Thun, L. J., Teh, P. L., and Cheng, C.-B. Cyberaid: Are your children safe from cyberbullying? Journal of King Saud University - Computer and Information Sciences 34 (7): 4099 – 4108, 2022.

Vasalou, A., Hopfensitz, A., and Pitt, J. V. In praise of forgiveness: Ways for repairing trust breakdowns in one-off online interactions. International Journal of Human-Computer Studies 66 (6): 466–480, 2008.

Wang, J., Fu, K., and Lu, C.-T. Sosnet: A graph convolutional network approach to fine-grained cyberbullying detection. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, pp. 1699–1708, 2020.

Waseem, Z. and Hovy, D. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. pp. 88–93, 2016.

Zhang, Z., Robinson, D., and Tepper, J. Detecting hate speech on twitter using a convolution-gru based deep neural network. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15. Springer, pp. 745–760, 2018.