Uma Abordagem para Detecção Automática de Fraudes em Aplicativos de Mensagens Instantâneas
Abstract
Instant messaging applications have enabled easy and efficient communication. However, they have also facilitated the widespread dissemination of cyber threats, such as financial fraud. In this context, the rapid and effective detection of fraud conveyed in texts shared on instant messaging applications becomes of paramount importance to prevent financial losses. This work presents two publicly available labeled datasets consisting of Brazilian Portuguese (PT-BR) messages collected from public groups on WhatsApp and Telegram, respectively, containing fraudulent messages, which were named FraudWhatsApp.Br and FraudTelegram.Br. Additionally, we conducted a series of text classification experiments, combining two different feature extraction methods, three distinct token generation strategies, two forms of preprocessing, and nine classification algorithms to discriminate texts into two categories: fraudulent and non-fraudulent texts. Our best results achieved an F1-score of 0.99 for both FraudTelegram.Br and FraudWhatsApp.Br datasets, showing the feasibility of the proposed approach.References
Apruzzese, G., Laskov, P., Montes de Oca, E., Mallouli, W., Brdalo Rapa, L., Grammatopoulos, A. V., and Di Franco, F. (2023). The role of machine learning in cybersecurity. Digital Threats, 4(1).
Ayres, L., Brito, I. V. S., and e Souza, R. G. (2019). Utilizando aprendizado de máquina para detecção automática de urls maliciosas brasileiras. In Anais do XXXVII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pages 972–985, Porto Alegre, RS, Brasil. SBC.
Barros, M., Silva, C., and Miranda, P. (2020). Xphide: Um sistema especialista para a detecção de phishing. In Anais do XX Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, pages 161–174, Porto Alegre, RS, Brasil. SBC.
Boukari, B. E., Ravi, A., and Msahli, M. (2021). Machine learning detection for smishing frauds. In 2021 IEEE 18th Annual Consumer Communications Networking Conference (CCNC), pages 1–2.
de Sá, I. C., Gadelha, T., Vinuto, T., da Silva, J. W. F., Monteiro, J. M., and Machado, J. C. (2023a). A real-time platform to monitoring misinformation on telegram. In Filipe, J., Smialek, M., Brodsky, A., and Hammoudi, S., editors, Proceedings of the 25th International Conference on Enterprise Information Systems, ICEIS 2023, Volume 1, Prague, Czech Republic, April 24-26, 2023, pages 271–278. SCITEPRESS.
de Sá, I. C., Galic, L., Franco, W., Gadelha, T., Monteiro, J. M., and Machado, J. C. (2023b). BATMAN: A big data platform for misinformation monitoring. In Filipe, J., Smialek, M., Brodsky, A., and Hammoudi, S., editors, Proceedings of the 25th International Conference on Enterprise Information Systems, ICEIS 2023, Volume 1, Prague, Czech Republic, April 24-26, 2023, pages 237–246. SCITEPRESS.
Kumar, R. and Bhat, A. (2022). A study of machine learning-based models for detection, control, and mitigation of cyberbullying in online social media. Int. J. Inf. Secur., 21(6):1409–1431.
Mishra, S. and Soni, D. (2023). Dsmishsms-a system to detect smishing SMS. Neural Comput. Appl., 35(7):4975–4992.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Prabhu Kavin, B., Karki, S., Hemalatha, S., Singh, D., Vijayalakshmi, R., Thangamani, M., Haleem, S. L. A., Jose, D., Tirth, V., Kshirsagar, P. R., Adigo, A. G., and Jain, D. K. (2022). Machine learning-based secure data acquisition for fake accounts detection in future mobile communication networks. Wirel. Commun. Mob. Comput., 2022.
Pranckevičius, T. and Marcinkevičius, V. (2017). Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic Journal of Modern Computing, 5(2):221.
Rosenfeld, A., Sina, S., Sarne, D., Avidov, O., and Kraus, S. (2018). A study of whatsapp usage patterns and prediction models without message content. arXiv preprint arXiv:1802.03393.
Rubin, V. L., Chen, Y., and Conroy, N. K. (2015). Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):1–4.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. (2012). Experimentation in software engineering. Springer Science & Business Media.
Ayres, L., Brito, I. V. S., and e Souza, R. G. (2019). Utilizando aprendizado de máquina para detecção automática de urls maliciosas brasileiras. In Anais do XXXVII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pages 972–985, Porto Alegre, RS, Brasil. SBC.
Barros, M., Silva, C., and Miranda, P. (2020). Xphide: Um sistema especialista para a detecção de phishing. In Anais do XX Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, pages 161–174, Porto Alegre, RS, Brasil. SBC.
Boukari, B. E., Ravi, A., and Msahli, M. (2021). Machine learning detection for smishing frauds. In 2021 IEEE 18th Annual Consumer Communications Networking Conference (CCNC), pages 1–2.
de Sá, I. C., Gadelha, T., Vinuto, T., da Silva, J. W. F., Monteiro, J. M., and Machado, J. C. (2023a). A real-time platform to monitoring misinformation on telegram. In Filipe, J., Smialek, M., Brodsky, A., and Hammoudi, S., editors, Proceedings of the 25th International Conference on Enterprise Information Systems, ICEIS 2023, Volume 1, Prague, Czech Republic, April 24-26, 2023, pages 271–278. SCITEPRESS.
de Sá, I. C., Galic, L., Franco, W., Gadelha, T., Monteiro, J. M., and Machado, J. C. (2023b). BATMAN: A big data platform for misinformation monitoring. In Filipe, J., Smialek, M., Brodsky, A., and Hammoudi, S., editors, Proceedings of the 25th International Conference on Enterprise Information Systems, ICEIS 2023, Volume 1, Prague, Czech Republic, April 24-26, 2023, pages 237–246. SCITEPRESS.
Kumar, R. and Bhat, A. (2022). A study of machine learning-based models for detection, control, and mitigation of cyberbullying in online social media. Int. J. Inf. Secur., 21(6):1409–1431.
Mishra, S. and Soni, D. (2023). Dsmishsms-a system to detect smishing SMS. Neural Comput. Appl., 35(7):4975–4992.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Prabhu Kavin, B., Karki, S., Hemalatha, S., Singh, D., Vijayalakshmi, R., Thangamani, M., Haleem, S. L. A., Jose, D., Tirth, V., Kshirsagar, P. R., Adigo, A. G., and Jain, D. K. (2022). Machine learning-based secure data acquisition for fake accounts detection in future mobile communication networks. Wirel. Commun. Mob. Comput., 2022.
Pranckevičius, T. and Marcinkevičius, V. (2017). Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic Journal of Modern Computing, 5(2):221.
Rosenfeld, A., Sina, S., Sarne, D., Avidov, O., and Kraus, S. (2018). A study of whatsapp usage patterns and prediction models without message content. arXiv preprint arXiv:1802.03393.
Rubin, V. L., Chen, Y., and Conroy, N. K. (2015). Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):1–4.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. (2012). Experimentation in software engineering. Springer Science & Business Media.
Published
2023-09-18
How to Cite
NASCIMENTO, Alexsandro; GADELHA, Thiago; MONTEIRO, José Maria; MACHADO, Javam.
Uma Abordagem para Detecção Automática de Fraudes em Aplicativos de Mensagens Instantâneas. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 23. , 2023, Juiz de Fora/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 251-264.
DOI: https://doi.org/10.5753/sbseg.2023.233611.
