Avaliação de Técnicas de Balanceamento de Dados na Detecção de Fraude em Transações Online de Cartão de Crédito

  • Arthur Cavalcanti Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Diego Brandão Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Eduardo Bezerra Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Rafaelli Coutinho Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)

Resumo


Devido ao aumento do comércio eletrônico e do uso de cartões de crédito, as fraudes com cartões de crédito tornaram-se um grande desafio para as entidades envolvidas. Apesar dos prejuízos, essas fraudes ainda representam uma pequena parte das transações, criando um problema de desbalanceamento de dados nas áreas de detecção de fraudes do sistema financeiro. Este trabalho avalia várias combinações de técnicas de seleção de atributos, balanceamento de classes e algoritmos de classificação. Para balancear as classes, foram usadas técnicas de subamostragem, superamostragem e ajustes de limiares nos classificadores. As combinações foram testadas em dois conjuntos de dados desbalanceados, avaliados pela métrica escore F1. Os resultados mostram um ganho de desempenho quando são implementadas técnicas de balanceamento de dados e otimização de limiares de classificação.

Palavras-chave: Técnicas de Balanceamento de Dados, Detecção de Fraudes em Cartão de Crédito

Referências

Amit Singh, R. K. R. and Tiwari, A. (2022). Credit card fraud detection under extreme imbalanced data: A comparative study of data-level algorithms. Journal of Experimental & Theoretical Artificial Intelligence, 34(4):571–598.

Bhagwani, H., Agarwal, S., Kodipalli, A., and Martis, R. J. (2021). Targeting class imbalance problem using gan. In 5th Inter. Conf. on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), pages 318–322.

Bhattacharyya, S. et al. (2011). Data mining for credit card fraud: A comparative study. Decis. Support Syst., 50:602–613.

Carcillo, F. et al. (2021). Combining unsupervised and supervised learning in credit card fraud detection. Information Sciences, 557:317–331.

Ghaleb, F. A. et al. (2023). Ensemble synthesized minority oversampling-based generative adversarial networks and random forest algorithm for credit card fraud detection. IEEE Access, 11:89694–89710.

Gupta, P. et al. (2023). Unbalanced credit card fraud detection data: A machine learning-oriented comparative study of balancing techniques. Procedia Computer Science, 218:2575–2584. International Conference on Machine Learning and Data Engineering.

Hasib, K. M. et al. (2020). A survey of methods for managing the classification and solution of data imbalance problem. Journal of Computer Science, 16(11):1546–1557.

He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9):1263–1284.

Hilal, W. et al. (2022). Financial fraud: A review of anomaly detection techniques and recent advances. Expert Systems with Applications, 193:116429.

Ileberi, E. et al. (2021). Performance evaluation of machine learning methods for credit card fraud detection using smote and adaboost. IEEE Access, 9:165286–165294.

Jahnavi, Y. et al. (2023). A novel ensemble stacking classification of genetic variations using machine learning algorithms. International Journal of Image and Graphics, 23.

Karthika, J. and Senthilselvi, A. (2023). An integration of deep learning model with navo minority over-sampling technique to detect the frauds in credit cards. Multimedia Tools Appl., 82(14):21757–21774.

Laborda, J. and Ryoo, S. (2021). Feature selection in a credit scoring model. Mathematics, 9(7).

Leevy, J., Johnson, J., Hancock, J., and Khoshgoftaar, T. (2023). Threshold optimization and random undersampling for imbalanced credit card data. Journal of Big Data, 10.

Makki, S. et al. (2019). An experimental study with imbalanced classification approaches for credit card fraud detection. IEEE Access, 7:93010–93022.

Muaz, A. et al. (2020). A comparison of data sampling techniques for credit card fraud detection. International Journal of Advanced Computer Science and Applications, 11.

Prabha, D. P. and Priscilla, C. V. (2024). Estimation of optimal threshold shifting to handle class imbalance in credit card fraud detection using machine learning techniques. In American Institute of Physics Conference Series, volume 2802, page 120014. AIP.

Priscilla, C. V. and Prabha, D. P. (2020). Influence of optimizing xgboost to handle class imbalance in credit card fraud detection. In 3rd Inter. Conf. on Smart Systems and Inventive Technology (ICSSIT), page 1309–1315.

Sisodia, D. S., Reddy, N. K., and Bhandari, S. (2017). Performance evaluation of class balancing techniques for credit card fraud detection. In 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), pages 2747–2752.

Sun, Y. et al. (2009). Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(04):687–719.

Xie, Y., Li, A., Gao, L., and Liu, Z. (2021). A heterogeneous ensemble learning model based on data distribution for credit card fraud detection. Wireless Communications and Mobile Computing, 2021(1):2531210.

Zhang, F. et al. (2019). Gmm-based undersampling and its application for credit card fraud detection. In International Joint Conference on Neural Networks, pages 1–8.
Publicado
14/10/2024
CAVALCANTI, Arthur; BRANDÃO, Diego; BEZERRA, Eduardo; COUTINHO, Rafaelli. Avaliação de Técnicas de Balanceamento de Dados na Detecção de Fraude em Transações Online de Cartão de Crédito. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 694-700. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2024.243462.