Credit scoring development in the light of the new Brazilian General Data Protection Law

Robinson A. A. de Oliveira-Junior

doi:10.5753/kdmile.2021.17462

Robinson A. A. de Oliveira-Junior USP

DOI: https://doi.org/10.5753/kdmile.2021.17462

Resumo

With the advent of the new Brazilian General Data Protection Law (LGPD) which determines the right to the explanation of automated decisions, the use of non-interpretable models for human beings, known as black boxes, for the purposes of credit risk assessment may remain unfeasible. Thus, three different methods commonly applied to credit scoring – logistic regression, decision tree, and support vector machine (SVM) – were adjusted to an anonymized sample of a consumer credit portfolio from a credit union. Their results were compared and the adequacy of the explanation achieved for each classifier was assessed. Particularly for the SVM, which generated a black box model, a local interpretation method – the SHapley Additive exPlanation (SHAP) – was incorporated, enabling this machine learning classifier to fulfill the requirements imposed by the new LGPD, in equivalence to the inherent comprehensibility of the white box models.

Palavras-chave: Classification models, Credit scoring, Interpretability, LGPD, Machine learning

Referências

Batista, M. R. S. A utilização de algoritmos de aprendizado de máquina em problemas de classificação. Universidade de São Paulo (USP), 2018.

EDPB. EUROPEAN DATA PROTECTION BOARD. Article 29 Data Protection Working Party. Guidelines on automated individual decision-making and profiling for the purposes of regulation 2016/679. EDPB, Feb., 2018.

Erickson, A. Comparative analysis of the EU’s GDPR and Brazil’s LGPD: Enforcement challenges with the LGPD. Brooklyn Journal of International Law 44 (2): 859–888, 2019.

Hand, D. J. and Henley, W. E. Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society. Series A (Statistics in Society) 160 (3): 523–541, 1997.

Louzada, F., Ara, A., and Fernandes, G. B. Classification methods applied to credit scoring: Systematic review and overall comparison. Surveys in Operations Research and Management Science 21 (2): 117–134, 2016.

Molnar, C. Interpretable machine learning, 2020.

Oliveira-Junior, R. A. A. Um estudo acerca dos métodos estatísticos clássicos e de inteligência artificial aplicados ao desenvolvimento de credit scoring à luz da nova Lei Geral de Proteção de Dados brasileira. Universidade de São Paulo (USP), 2021.

Ribeiro, M. T., Singh, S., and Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. Association for Computing Machinery, New York, pp. 1135–1144, 2016.

Santo, J. L. C. F. A semiotic view on the interpretability of machine learning models. Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio), 2018.

Thomas, L. C., Crook, J. N., and Edelman, D. B. Credit scoring and its applications. Society for Industrial and Applied Mathematics, New York, 2002.