A Machine Learning-based System for Financial Fraud Detection
Resumo
Companies created for money-laundering or as a means for taxevasion are harmful to the country's economy and society. This problem is usually tackled by governmental agencies by having officials to pore over companies' financial data and to single out those that exhibit fraudulent behavior. Such work tends to be slow-paced and tedious. This paper proposes a machine learning-based system capable of classifying whether a company is likely to be involved in fraud or not. Based on financial and tax data from various companies, four different classifiers – k-Nearest Neighbors, Random Forest, Support Vector Machine (SVM), and a Neural Network – were trained and then used to indicate fraud. The best-performing model achieved a macro-averaged F1-score of 92.98% with the Random Forest.
Referências
Awoyemi, J. O., Adetunmbi, A. O., and Oluwadare, S. A. (2017). Credit card fraud detection using machine learning techniques: A comparative analysis. In 2017 International Conference on Computing Networking and Informatics (ICCNI), pages 1–9.
Chouiekh, A. and Haj, E. H. I. E. (2018). Convnets for fraud detection analysis. Procedia Computer Science, 127:133–138.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3):273– 297.
Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27.
Fu, K., Cheng, D., Tu, Y., and Zhang, L. (2016). Credit card fraud detection using convolutional neural networks. In International Conference on Neural Information Processing, pages 483–490. Springer.
Ho, T. K. (1995). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE.
Kumar, M. S., Soundarya, V., Kavitha, S., Keerthika, E. S., and Aswini, E. (2019). Credit card fraud detection using random forest algorithm. In 2019 3rd International Conference on Computing and Communications Technologies (ICCCT), pages 149–153.
Lavion, D. et al. (2018). Pulling fraud out of the shadows. PwC’s.
Liu, C., Chan, Y., Alam Kazmi, S. H., and Fu, H. (2015). Financial fraud detection model: Based on random forest. International journal of economics and finance, 7(7).
Maes, S., Tuyls, K., Vanschoenwinkel, B., and Manderick, B. (2002). Credit card fraud detection using bayesian and neural networks. In Proceedings of the 1st international naiso congress on neuro fuzzy technologies, pages 261–270.
McCulloch, W. S. and Pitts, W. (1988). A Logical Calculus of the Ideas Immanent in Nervous Activity, page 15–27. MIT Press, Cambridge, MA, USA.
Mittal, S. and Tyagi, S. (2019). Performance evaluation of machine learning algorithms for credit card fraud detection. In 2019 9th International Conference on Cloud Computing, Data Science Engineering (Conuence), pages 320–324.
Nadim, A. H., Sayem, I. M., Mutsuddy, A., and Chowdhury, M. S. (2019). Analysis of machine learning techniques for credit card fraud detection. In 2019 International Conference on Machine Learning and Data Engineering (iCMLDE), pages 42–47.
Najadat, H., Altiti, O., Aqouleh, A. A., and Younes, M. (2020). Credit card fraud detection based on machine and deep learning. In 2020 11th International Conference on Information and Communication Systems (ICICS), pages 204–208.
Ngai, E., Hu, Y., Wong, Y., Chen, Y., and Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3):559 – 569.
Pai, P.-F., Hsu, M.-F., and Wang, M.-C. (2011). A support vector machine-based model for detecting top management fraud. Knowledge-Based Systems, 24(2):314–321.
Paula, E. L., Ladeira, M., Carvalho, R. N., and Marzagao, T. (2016). Deep learning anomaly detection as support fraud investigation in brazilian exports and anti-money laundering. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 954–960. IEEE.
Pedregosa, F. et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
SADGALI, I., SAEL, N., and BENABBOU, F. (2019). Fraud detection in credit card transaction using machine learning techniques. In 2019 1st International Conference on Smart Systems and Data Science (ICSSD), pages 1–4.
Simpson, J. (2006). Oxford Dictionary of English. Oxford University Press, Oxford, United Kingdom.
Thennakoon, A., Bhagyani, C., Premadasa, S., Mihiranga, S., and Kuruwitaarachchi, N. (2019). Real-time credit card fraud detection using machine learning. In 2019 9th International Conference on Cloud Computing, Data Science Engineering (Conuence), pages 488–493.
Wu, Y., Zheng, Q., Gao, Y., Dong, B., Wei, R., Zhang, F., and He, H. (2019). Tedm-pu: A tax evasion detection method based on positive and unlabeled learning. In 2019 IEEE International Conference on Big Data (Big Data), pages 1681–1686. IEEE.
Yao, J., Zhang, J., and Wang, L. (2018). A financial statement fraud detection model based on hybrid data mining methods. In 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), pages 57–61.
Zhu, X., Yan, Z., Ruan, J., Zheng, Q., and Dong, B. (2018). IRTED-TL: An inter-region tax evasion detection method based on transfer learning. In 17th IEEE Intl. Conf. On Trust, Security And Privacy In Computing And Communications/12th IEEE Intl. Conf. On Big Data Science And Engineering (TrustCom/BigDataSE), pages 1224–1235.