Otimização de Florestas Aleatórias através de ponderação de folhas em árvore de regressão

Caio Ponte; Carlos Caminha; Vasco Furtado

doi:10.5753/eniac.2020.12171

Caio Ponte Universidade de Fortaleza
Carlos Caminha Universidade de Fortaleza
Vasco Furtado Universidade de Fortaleza

DOI: https://doi.org/10.5753/eniac.2020.12171

Resumo

Floresta Aleatória é um algoritmo popular e efetivo na resolução de problemas de classificação e regressão. As predições de uma Floresta Aleatória são feitas considerando que cada árvore possui igual contribuição no resultado final. Este trabalho propõe um novo método de ponderação de árvores de regressão com o objetivo de melhorar o poder de predição do modelo. Nossa estratégia é motivada em utilizar medidas de dispersão estatística, como desvio padrão ou erro padrão da média, como indicadores de qualidade da predição na folha. A estratégia de ponderação proposta foi comparada com outros métodos de ponderação. Nessa comparação observou-se que a mesma reduziu o Erro Absoluto Médio em cerca de 30% dos conjuntos de dados estudados.

Palavras-chave: Florestas Aleatórias, Ponderação de Árvores, Aprendizado de Máquina

Referências

Amaratunga, D., Cabrera, J., and Lee, Y.-S. (2008). Enriched random forests. Bioinformatics, 24(18):2010–2014.

Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.

Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and regression trees.

Dietterich, T. G. et al. (2002). Ensemble learning. The handbook of brain theory and neural networks, 2:110–125.

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 20(8):832–844.

Kim, H., Kim, H., Moon, H., and Ahn, H. (2011). A weight-adjusted voting algorithm for ensembles of classifiers. Journal of the Korean Statistical Society, 40:437–449.

Kleiman, R. Instance-based out-of-bag weighting in random forests.

Li, H. B., Wang, W., Ding, H. W., and Dong, J. (2010). Trees weighting random forest method for classifying high-dimensional noisy data. In 2010 IEEE 7th International Conference on E-Business Engineering, pages 160–163. IEEE.

Puuronen, S., Terziyan, V., and Tsymbal, A. (1999). A dynamic integration algorithm for an ensemble of classifiers. In International symposium on methodologies for intelligent systems, pages 592–600. Springer.

Rooney, N., Patterson, D., Anand, S., and Tsymbal, A. (2004). Dynamic integration of regression models. In International Workshop on Multiple Classifier Systems, pages 164–173. Springer.

Sagi, O. and Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4):e1249.

Tsymbal, A., Pechenizkiy, M., and Cunningham, P. (2006). Dynamic integration with random forests. In European conference on machine learning, pages 801–808. Springer.

Amaratunga, D., Cabrera, J., and Lee, Y.-S. (2008). Enriched random forests. Bioinformatics, 24(18):2010–2014.

Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.

Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and regression trees.

Dietterich, T. G. et al. (2002). Ensemble learning. The handbook of brain theory and neural networks, 2:110–125.

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 20(8):832–844.

Kim, H., Kim, H., Moon, H., and Ahn, H. (2011). A weight-adjusted voting algorithm for ensembles of classifiers. Journal of the Korean Statistical Society, 40:437–449.

Kleiman, R. Instance-based out-of-bag weighting in random forests.

Li, H. B., Wang, W., Ding, H. W., and Dong, J. (2010). Trees weighting random forest method for classifying high-dimensional noisy data. In 2010 IEEE 7th International Conference on E-Business Engineering, pages 160–163. IEEE.

Puuronen, S., Terziyan, V., and Tsymbal, A. (1999). A dynamic integration algorithm for an ensemble of classifiers. In International symposium on methodologies for intelligent systems, pages 592–600. Springer.

Rooney, N., Patterson, D., Anand, S., and Tsymbal, A. (2004). Dynamic integration of regression models. In International Workshop on Multiple Classifier Systems, pages 164–173. Springer.

Sagi, O. and Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4):e1249.

Tsymbal, A., Pechenizkiy, M., and Cunningham, P. (2006). Dynamic integration with random forests. In European conference on machine learning, pages 801–808. Springer.