Automatic Group Labeling with Decision Trees: A Comparative Approach
Resumo
The exponential growth in data volume demands efficient data analysis techniques, with data clustering being crucial but interpretation often posing a challenge. Automated group labeling using decision trees can alleviate this issue. This study compares four decision tree algorithms for automated group labeling, demonstrating that algorithm choice significantly influences performance. CHAID outperforms other algorithms in the Iris and Seeds datasets, while C4.5 excels in the Wine and Glass datasets. The proposed model’s validity is confirmed, highlighting the importance of careful algorithm selection. These findings underscore the potential of automated group labeling models and emphasize the need for further research to refine and expand their applications across various domains.
Palavras-chave:
Automatic labeling, Cluster interpretation, Decision trees
Referências
Bertsimas, D., Orfanoudaki, A., and Wiberg, H. M. (2020). Interpretable clustering: an optimization approach. Machine Learning, 110:89–138.
de Lima, B. V. A., Machado, V. P., and Lopes, L. A. (2015). Automatic labeling of social network users scientia. net through the machine learning supervised application. Social Network Analysis and Mining, 5:1–10.
Di Teodoro, G., Monaci, M., and Palagi, L. (2024). Unboxing tree ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree. EURO Journal on Computational Optimization, 12:100084.
Dimotikalis, Y., Karagrigoriou, A., Parpoula, C., and Skiadas, C. H. (2021). Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools. John Wiley Sons, Incorporated, Newark.
Dua, D. and Graff, C. (2017). UCI machine learning repository.
Filho, F. I., Machado, V. P., Veras, R. D. M. S., Aires, K. R. T., and Montenegro Leal Silva, A. (2020). Group labeling methodology using distance-based data grouping algorithms. Revista de Informática Teórica e Aplicada, 27(1):48–61.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188.
Hara, S. and Hayashi, K. (2016). Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390.
Lopes, L. A., Machado, V. P., Rabêlo, R. A., Fernandes, R. A., and Lima, B. V. (2016). Automatic labelling of clusters of discrete and continuous data with supervised machine learning. Knowledge-Based Systems, 106:231–241.
Lopes, L. A., Machado, V. P., and Rabêlo, R. A. L. (2013). Automatic labeling of groupings through supervised machine learning.
Lopes, L. A., Machado, V. P., and Rabelo, R. D. A. L. (2014). Automatic cluster labeling through artificial neural networks. In 2014 International Joint Conference on Neural Networks (IJCNN), pages 762–769. IEEE.
Machado, V. P., Ribeiro, V., and RABêLO, R. (2015). Rotulacao de grupos utilizando conjuntos fuzzy. In XII Simposio Brasileiro de Automacao Inteligente-SBAI, number 12, pages 355–360.
Moura, M., Veras, R., and Machado, V. (2022). Caibal: Cluster-attribute interdependency based automatic labeler. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC ’22, page 1109–1116, New York, NY, USA. Association for Computing Machinery.
Russell, S. and Norvig, P. (2016). Artificial Intelligence: A Modern Approach. Pearson.
Serengil, S. I. (2021). Chefboost: A lightweight boosted decision tree framework. DOI: 10.5281/zenodo.5576203.
Silva, L. E. S., Machado, V. P., Araujo, S. S., de Lima, B. V. A., and Veras, R. d. M. S. (2021). Using regression error analysis and feature selection to automatic cluster labeling. In Progress in Artificial Intelligence: 20th EPIA Conference on Artificial Intelligence, EPIA 2021, Virtual Event, September 7–9, 2021, Proceedings 20, pages 376–388. Springer.
de Lima, B. V. A., Machado, V. P., and Lopes, L. A. (2015). Automatic labeling of social network users scientia. net through the machine learning supervised application. Social Network Analysis and Mining, 5:1–10.
Di Teodoro, G., Monaci, M., and Palagi, L. (2024). Unboxing tree ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree. EURO Journal on Computational Optimization, 12:100084.
Dimotikalis, Y., Karagrigoriou, A., Parpoula, C., and Skiadas, C. H. (2021). Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools. John Wiley Sons, Incorporated, Newark.
Dua, D. and Graff, C. (2017). UCI machine learning repository.
Filho, F. I., Machado, V. P., Veras, R. D. M. S., Aires, K. R. T., and Montenegro Leal Silva, A. (2020). Group labeling methodology using distance-based data grouping algorithms. Revista de Informática Teórica e Aplicada, 27(1):48–61.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188.
Hara, S. and Hayashi, K. (2016). Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390.
Lopes, L. A., Machado, V. P., Rabêlo, R. A., Fernandes, R. A., and Lima, B. V. (2016). Automatic labelling of clusters of discrete and continuous data with supervised machine learning. Knowledge-Based Systems, 106:231–241.
Lopes, L. A., Machado, V. P., and Rabêlo, R. A. L. (2013). Automatic labeling of groupings through supervised machine learning.
Lopes, L. A., Machado, V. P., and Rabelo, R. D. A. L. (2014). Automatic cluster labeling through artificial neural networks. In 2014 International Joint Conference on Neural Networks (IJCNN), pages 762–769. IEEE.
Machado, V. P., Ribeiro, V., and RABêLO, R. (2015). Rotulacao de grupos utilizando conjuntos fuzzy. In XII Simposio Brasileiro de Automacao Inteligente-SBAI, number 12, pages 355–360.
Moura, M., Veras, R., and Machado, V. (2022). Caibal: Cluster-attribute interdependency based automatic labeler. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC ’22, page 1109–1116, New York, NY, USA. Association for Computing Machinery.
Russell, S. and Norvig, P. (2016). Artificial Intelligence: A Modern Approach. Pearson.
Serengil, S. I. (2021). Chefboost: A lightweight boosted decision tree framework. DOI: 10.5281/zenodo.5576203.
Silva, L. E. S., Machado, V. P., Araujo, S. S., de Lima, B. V. A., and Veras, R. d. M. S. (2021). Using regression error analysis and feature selection to automatic cluster labeling. In Progress in Artificial Intelligence: 20th EPIA Conference on Artificial Intelligence, EPIA 2021, Virtual Event, September 7–9, 2021, Proceedings 20, pages 376–388. Springer.
Publicado
17/11/2024
Como Citar
MEDEIROS, Manoel Messias P.; LUZ, Daniel de S.; VERAS, Rodrigo de Melo S..
Automatic Group Labeling with Decision Trees: A Comparative Approach. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 787-798.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2024.245214.