Análise Comparativa de Implementações de Algoritmos de Árvores de Decisão para Aplicações no Serviço Público

  • Vinicius Rogério da Silva ENCE / IBGE
  • Eduardo Corrêa Gonçalves ENCE / IBGE

Abstract


Decision Trees are widely used in the context of public administration where predictions of algorithms are employed to support managers in making decisions that can have a profound effect on people's lives. In this paper, we performed a comparative analysis of three open source implementations in Python and R for two popular decision tree algorithms (CART and C4.5). We compared the models generated by these implementations with respect to predictive performance, training and classification time, and interpretability. The results of this study are intended to contribute to the use of the evaluated implementations in public service as well as in other areas in which the use of interpretable classification models is desirable.

References

Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984), Classification and regression trees, Taylor & Francis, 1st edition.

Dua, D. and Graff, C. (2021). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Acessado em 15/09/2021.

Fabris, F., Magalhães, J. P. and Freitas, A. A. (2017). A review of supervised machine learning applied to ageing research. In Biogerontology, 18, n. 2, pages. 171–188.

Feurer et al. (2015). “Efficient and Robust Automated Machine Learning”, In: 28th Int’l Conf. on Neural Information Processing Systems (NIPS 2015), p.2755–2763.

Freitas, A. A. (2014). Comprehensible classification models: a position paper. In SIGKDD Explor. Newsl. 15, pages 1–10. ACM.

Han, J., Kamber, M., and Pei, J. (2011), Data mining: Concepts and techniques, Morgan Kaufmann Publishers, 3rd edition.

Harris et al. (2020). Array programming with NumPy. In Nature 585, pages 357–362. Japkowicz, N. and Shah, M., Evaluating Learning Algorithms: A Classification Perspective, Cambridge University Press, 2011.

Kaggle. https://www.kaggle.com. Acessado em 15/09/2021.

Kern, C., Klausch, T., and Kreuter, F. (2019). Tree-based machine learning methods for survey research. In Survey Research Methods, 13, Issue 1, pages 7393. ESRA.

Kuhn, M. (2020) “C5.0: C5.0 Decision Trees and Rule-Based Models”, https://CRAN.R-project.org/package=C5.0.

Luo, G. et al. (2015). A systematic review of predictive models for asthma development in children. In BMC Med Inform Decis Mak,. 15(1):99.

Nelson, J. B., Kennedy, W. G., and Greenberg, A. M. (2015). “Agents and Decision Trees for Microdata”. In: 24th BRiMS. Parmentier, A. and Vidal, T. (2021). “OCEAN: Optimal Counterfactual Explanations in Tree Ensembles”, In: 38th Int’l Conf. on Machine Learning (ICML 2021).

Pedregosa et al. (2011). Scikit-learn: Machine learning in python. In JMLR 12, pages 2825–2830.

Pérez et al. (2019). “Análise de Mudanças em Fatores Socioeconômicos Baseado em Árvore de Decisão para o Estudo de Viagens por Motivos Trabalho e Estudo na Região Metropolitana de São Paulo”, In: 51º SBPO, SOBRAPO, p.399–406.

Prado, C. R., Peres, S. M., and Fantinato, M. (2015). “Tomada de Decisão na Administração Pública Apoiada pela Descoberta de Conhecimento: Um Estudo de Caso em Gestão de Projetos”, In: XI SBSI, SBC, p.399–406.

Quinlan, J. (1993), C4.5: Programs for machine learning, Morgan Kaufmann.

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. In Nat Mach Intell, 1, pages 206–215.

Thapa, B. E. P. (2019). Artificial intelligence, big data and algorithmic decision-making in government from a liberal perspective, ELF.

Therneau, T., Atkinson, B., and Ripley, B. (2019) “rpart: Recursive Partitioning and Regression Trees”. https://CRAN.R-project.org/package=rpart.

Varshney, K. R. (2015). “Data Science of the People, for the People, by the People: A Viewpoint on an Emerging Dichotomy”, In: D4GX 2015, Bloomberg, p. 1–6.

Vidal, T. and Schiffer, M. (2020). “Born-Again Tree Ensembles”, In: 37th Int’l Conf. on Machine Learning (ICML 2020), p.9743–9753.

Zeng, J., Ustun, B. and Rudin, C. (2017). Interpretable classification models for recidivism prediction. In J. R. Stat. Soc. A, 180, pages 689-722. Royal Statistical Society.
Published
2021-10-25
SILVA, Vinicius Rogério da; GONÇALVES, Eduardo Corrêa. Análise Comparativa de Implementações de Algoritmos de Árvores de Decisão para Aplicações no Serviço Público. In: REGIONAL SCHOOL ON COMPUTING OF BAHIA, ALAGOAS, AND SERGIPE (ERBASE), 21. , 2021, Maceió. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 10-19. DOI: https://doi.org/10.5753/erbase.2021.20051.