A Financial Distress Prediction using a Non-stationary Dataset

  • Rubens Marques Chaves Universidade de Brasília
  • André Luis Debiaso Rossi Universidade Estadual Paulista
  • Luís Paulo Faina Garcia Universidade de Brasília


Financial distress prediction (FDP) is crucial to companies, investors, and authorities. However, most FDP studies have been based on stationary models, disregarding important challenges present on financial distress data such as non-stationarity. Therefore, the lack of real-world datasets of economic-financial indicators organized in a timeline manner is a gap to be addressed. This study proposes a comprehensive dataset of 84 economic-financial indicators from the Brazilian Securities and Exchange Commission (CVM) organized in a non-stationary manner and validated by experiments using classification models. The results of the metrics AUC-ROC, AUC-PS, F1-Score and Gmean bring evidences that the dataset is suitable for FDP.

Palavras-chave: Financial Distress, CVM, non-stationary, Machine Learning


Agarwal, V. and Taffler, R. (2008). Comparing the performance of market-based and accounting-based bankruptcy prediction models. Journal of Banking & Finance, 32(8):1541–1551.

Alam, T. M., Shaukat, K., Mushtaq, M., Ali, Y., Khushi, M., Luo, S., and Wahab, A. (2020). Corporate Bankruptcy Prediction: An Approach Towards Better Corporate World. The Computer Journal, 64(11):1731–1746.

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4):589–609.

Altman, E. I. (2013). Predicting financial distress of companies: revisiting the z-score and zeta® models. In Handbook of research methods and applications in empirical finance, page 428–456. Edward Elgar Publishing.

Altman, E. I., Haldeman, R. G., and Narayanan, P. (1977). Zeta™ analysis a new model to identify bankruptcy risk of corporations. Journal of banking & finance, 1(1):29–54.

Altman, E. I., Marco, G., and Varetto, F. (1994). Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the italian experience). Journal of Banking & Finance, 18(3):505–529.

Barboza, F., Kimura, H., and Altman, E. (2017). Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83:405–417.

Barboza, F. L. d. M., Duarte, D. L., and Cunha, M. A. (2022). Anticipating corporate’s distresses. EXACTA Engenharia de Produção, 20(2).

Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, 4:71–111.

Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145–1159.

Bragoli, D., Ferretti, C., Ganugi, P., Marseguerra, G., Mezzogori, D., and Zammori, F. (2022). Machine-learning models for bankruptcy prediction: do industrial variables matter? Spatial Economic Analysis, 17(2):156–177.

Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification And Regression Trees. CRC Press, 1st edition edition.

Chen, Y., Guo, J., Huang, J., and Lin, B. (2022). A novel method for financial distress prediction based on sparse neural networks with l1/2 regularization. International Journal of Machine Learning and Cybernetics, 13(7):2089–2103.

Cieslak, D. A. and Chawla, N. V. (2008). Learning decision trees for unbalanced data. In Machine Learning and Knowledge Discovery in Databases, pages 241–256.

Clement, C. (2020). Machine learning in bankruptcy prediction - a review. Journal of Public Administration, Finance and Law, 17:178–197.

Comissão de Valores Monetários (2022). Resolução CVM Nº 155, de 23 de Junho de 2022. Diário Oficial da União.

Demsǎr, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1––30.

Douglas, W. O. and Bates, G. E. (1933). The federal securities act of 1933. Yale Law Journal, 43(2):171.

Duarte, D. L. and Barboza, F. L. d. M. (2020). Forecasting financial distress with machine learning – a review. Future Studies Research Journal: Trends and Strategies, 12(3):528—-574.

Duarte, F. and Jones, C. (2017). Empirical network contagion for u.s. financial institutions. FRB of NY Staff Report, 1(826).

Eichengreen, B., Mody, A., Nedeljkovic, M., and Sarno, L. (2012). How the subprime crisis went global: Evidence from bank credit default swap spreads. Journal of International Money and Finance, 31(5):1299–1318.

Fernández, A., García, S., Galar, M., Prati, R., Krawczyk, B., and Herrera, F. (2018). Learning from Imbalanced Data Sets. Springer Cham.

Frydman, H., Altman, E. I., and Kao, D.-L. (1985). Introducing recursive partitioning for financial classification: the case of financial distress. The journal of finance, 40(1):269–291.

Gomes, H. M., Read, J., Bifet, A., Barddal, J. P., and Gama, J. (2019). Machine learning for streaming data: State of the art, challenges, and opportunities. ACM SIGKDD Exploration Newsletter, 21(2):6—-22.

Hanley, J. and Mcneil, B. (1982). The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143:29–36.

Hui, X.-F. and Sun, J. (2006). An application of support vector machine to companies’ financial distress prediction. In Modeling Decisions for Artificial Intelligence, pages 274–282. Springer Berlin Heidelberg.

Hyndman, R. J. and Athanasopoulos, G. (2021). Forecasting: Principles and Practice. OTexts.

Jabeur, S. B., Gharib, C., Mefteh-Wali, S., and Arfi, W. B. (2021). Catboost model and artificial intelligence techniques for corporate failure prediction. Technological Forecasting and Social Change, 166:120658.

Kim, H., Cho, H., and Ryu, D. (2022). Corporate bankruptcy prediction using machine learning methodologies with a focus on sequential data. Computational Economics, 59(3):1231–1249.

Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 8(3):281–300.

Kumbure, M. M., Lohrmann, C., Luukka, P., and Porras, J. (2022). Machine learning techniques and data for stock market forecasting: A literature review. Expert Systems with Applications, 197:116659.

Li, Z., Huang, W., Xiong, Y., Ren, S., and Zhu, T. (2020). Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm. Knowledge-Based Systems, 195:105694.

Liang, D., Lu, C.-C., Tsai, C.-F., and Shih, G.-A. (2016). Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study. European Journal of Operational Research, 252(2):561–572.

Liang, D. and Tsai, C.-F. (2020). Taiwanese bankruptcy prediction. UCI Machine Learning Repository.

Lin, X., Zhang, Y., Wang, S., and Ji, G. (2013). A rule-based model for bankruptcy prediction based on an improved genetic ant colony algorithm. Mathematical Problems in Engineering, page 753251.

Lombardo, G., Pellegrino, M., Adosoglou, G., Cagnoni, S., Pardalos, P. M., and Poggi, A. (2022). Machine learning for bankruptcy prediction in the american stock market: Dataset and benchmarks. Future Internet, 14(8).

Martin, D. (1977). Early warning of bank failure: A logit regression approach. Journal of Banking & Finance, 1(3):249–276.

Martorano, L. (2021). Company bankruptcy prediction. Kaggle. Accessed: 2022-05-21.

Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1):109–131.

Pilch, B. (2021). An analysis of the effectiveness of bankruptcy prediction models – an industry approach. Folia Oeconomica Stetinensia, 21(2):76–96.

Ross, S. A., Westerfield, R., and Jaffe, J. (2012). Corporate Finance. Irwin/McGraw-Hill, 10th edition.

Saito, T. and Rehmsmeier, M. (2015). The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10:1–21.

Shen, F., Liu, Y., Wang, R., and Zhou, W. (2020). A dynamic financial distress forecast model with multiple forecast results under unbalanced data environment. Knowledge-Based Systems, 192:105365.

Shi, Y. and Li, X. (2019). A bibliometric study on intelligent techniques of bankruptcy prediction for corporate firms. Heliyon, 5(12):12.

Silva, T. C., da Silva Alexandre, M., and Tabak, B. M. (2017). Bank lending and systemic risk: A financial-real sector network approach with feedback. Journal of Financial Stability, 38:98–118.

Simon, C. J. (1989). The effect of the 1933 securities act on investor information and the performance of new issues. The American Economic Review, 79(3):295–318.

Succurro, M. (2017). Financial bankruptcy across european countries. International Journal of Economics and Finance, 9(7):132–146.

Sun, J., Li, H., Huang, Q.-H., and He, K.-Y. (2014). Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowledge-Based Systems, 57:41–56.

Sun, J., yue Jia, M., and Li, H. (2011). Adaboost ensemble for financial distress prediction: An empirical comparison with data from chinese listed companies. Expert Systems with Applications, 38(8):9305–9312.

Sun, J., Zhou, M., Ai, W., and Li, H. (2019). Dynamic prediction of relative financial distress based on imbalanced data stream: from the view of one industry. Risk Management, 21(4):215–242.

Tang, Y., i, J., Zhu, Y., Gao, S., Tang, Z., and Todo, Y. (2019). A differential evolution-oriented pruning neural network model for bankruptcy prediction. Complexity, 2019(8682124).

Tomczak, S. (2016). Polish companies bankruptcy data. UCI Machine Learning Repository. DOI: 10.24432/C5F600.

Wang, S., Minku, L. L., and Yao, X. (2018). A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems, 29(10):4802–4821.

Zhang, H., Li, Z., Shahriar, H., Tao, L., Bhattacharya, P., and Qian, Y. (2019). Improving prediction accuracy for logistic regression on imbalanced datasets. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), volume 1, pages 918–919.

Zibanezhad, E., Foroghi, D., and Monadjemi, A. (2011). Applying decision tree to predict bankruptcy. In 2011 IEEE International Conference on Computer Science and Automation Engineering, volume 4, pages 165–169.

Zieba, M., Tomczak, S. K., and Tomczak, J. M. (2016). Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Systems with Applications, 58:93–101.

Zou, Y., Gao, C., and Gao, H. (2022). Business failure prediction based on a cost-sensitive extreme gradient boosting machine. IEEE Access, 10:42623–42639.
CHAVES, Rubens Marques; ROSSI, André Luis Debiaso; GARCIA, Luís Paulo Faina. A Financial Distress Prediction using a Non-stationary Dataset. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 20. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 300-314. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2023.234013.