Impact of Unusual Features in Credit Scoring Problem

Luiz Felipe Vercosa; Rodrigo Lira; Rodrigo Monteiro; Kleber Silva; Jailson Magalhaes; Alexandre Maciel; Byron L. D. Bezerra; Carmelo Bastos-Filho

doi:10.5753/kdmile.2020.11962

Luiz Felipe Vercosa Universidade de Pernambuco
Rodrigo Lira Instituto Federal de Pernambuco
Rodrigo Monteiro Universidade Federal de Pernambuco
Kleber Silva Universidade de Pernambuco
Jailson Magalhaes Universidade de Pernambuco
Alexandre Maciel Universidade de Pernambuco
Byron L. D. Bezerra Universidade de Pernambuco
Carmelo Bastos-Filho Universidade de Pernambuco

DOI: https://doi.org/10.5753/kdmile.2020.11962

Resumo

Standard features used for Credit Scoring includes mainly registration and financial data from customers. However, exploring new features is of great interest for financial companies, since slight improvements in the person score directly impact the company revenue. In this work, we categorize features from open credit scoring datasets and compare them with the features found in a real company dataset. The company dataset contains unusual feature groups such as historical, geolocation, web behavior, and demographic data. We performed bivariate tests using the Kolmogorov-Smirnov metric and features to assess the performance of the particular feature groups. We also generated a score of good payer by using AdaBoost, Multilayer Perceptron, and XGBoost algorithms. Then, we analyzed the results with different metrics and compared them with the real company results. Our main finding was that these features added a small improvement to current datasets. We also identified the most promising feature groups and noticed that the tuned XGBoost performed better than the company solution in three out of four deployed metrics.

Palavras-chave: credit scoring, feature groups, novel dataset, web crawling

Referências

Bergstra, J. and Bengio, Y. Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13 (1): 281–305, 2012.

Chen, T., He, T., Benesty, M., Khotilovich, V., and Tang, Y. Xgboost: extreme gradient boosting. R package version 0.4-2 , 2015.

Ekin, O., Hammer, P. L., Kogan, A., and Winter, P. Distance-based classification methods. INFOR: Information Systems and Operational Research 37 (3): 337–352, 1999.

Fawcett, T. An introduction to roc analysis tom. Irbm 35 (6): 299–309, 2005.

He, H., Zhang, W., and Zhang, S. A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Systems with Applications vol. 98, pp. 105 – 117, 2018.

Liberati, C. and Camillo, F. Personal values and credit scoring: new insights in the financial prediction. Journal of the Operational Research Society 69 (12): 1994–2005, 2018.

Massmann, C. and Holzmann, H. Analysing goodness of fit measures using a sensitivity based approach. EGUGA, 2012.

Mester, L. J. et al. What’s the point of credit scoring? Business review 3 (Sep/Oct): 3–16, 1997.

Nazzal, J. M., El-Emary, I. M., and Najim, S. A. Multilayer perceptron neural network (mlps) for analyzing the properties of jordan oil shale 1, 2008.

Neuhauser, M. Nonparametric statistical tests: A computational approach. Chapman and Hall/CRC, 2011.

Niu, B., Ren, J., and Li, X. Credit scoring using machine learning by combing social network information: Evidence from peer-to-peer lending. Information 10 (12): 397, 2019.

PAKDD Conference. 13th Pacific-Asia Knowledge Discovery and Data Mining Conference (PAKDD 2009) - Data Mining Competition, 2009.

Thomas, L. C., Crook, J., and Edelman, D. Credit Scoring and Its Applications. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.

Wirth, R. and Hipp, J. Crisp-dm: Towards a standard process model for data mining. In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. Springer-Verlag London, UK, pp. 29–39, 2000.

Yeh, I.-C. and Lien, C.-h. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36 (2): 2473–2480, 2009.

Ying, C., Qi-Guang, M., Jia-Chen, L., and Lin, G. Advance and prospects of adaboost algorithm. Acta Automatica Sinica 39 (6): 745–758, 2013.

Zhou, L. and Lai, K. K. Adaboosting neural networks for credit scoring. In The Sixth International Symposium on Neural Networks (ISNN 2009). Springer, pp. 875–884, 2009.