Impulsionando Árvores Extremamente Aleatórias Ensacadas em Paralelo para Classificação de Textos
Resumo
O presente trabalho tem como proposta a paralelização do BERT, um algoritmo que combina boosting com bagging de árvores extremamente aleatórias para fazer classificação automática de conjuntos de dados textuais. Usar conjuntos de alta dimensionalidade pode tornar a construção dos classificadores uma tarefa onerosa. O paralelismo aliado às placas gráficas pode contornar esse desafio, uma vez que elas oferecem um alto poder de processamento, o tempo de execução pode diminuir consideravelmente.
Referências
Campos, R., Canuto, S., Salles, T., de Sá, C. C., and Gonçalves, M. A. (2017). Stacking bagged and boosted forests for effective automated classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pages 105–114, New York, NY, USA. ACM.
Cano, A. (2018). A survey on graphic processing unit computing for large-scale data mining. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 8(1).
Garrido, A. P. (2016) “What is the difference between Bagging and Boosting?”, https://quantdare.com/what-is-the-difference-between-bagging-and-boosting, September.
Grahn, H., Lavesson, N., Lapajne, M. H., Slat, D. (2011). CudaRF: A CUDA-based implementation of random forests. In Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA. 95-101.
Jansson, K., Sundell, H., and Boström, H. (2014). gpurf and gpuert: Efficient and scalable gpu algorithms for decision tree ensembles. In Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW ’14, pages 1612–1621, Washington, DC, USA. IEEE Computer Society.
Marron, D., Bifet, A., and Morales, G. D. F. (2014). Random forests of very fast decision trees on gpu for mining evolving big data streams. In Proceedings of the Twenty-first European Conference on Artificial Intelligence, ECAI’14, pages 615–620, Amsterdam,The Netherlands, The Netherlands. IOS Press.
Mitchell, R. and Frank, E. (2017). Accelerating the xgboost algorithm using gpu computing. PeerJ Computer Science, 3:e127.
Salles, T., Gonçalves, M., Rodrigues, V., and Rocha, L. (2015). Broof: Exploiting out-of-bag errors, boosting and random forests for effective automated classification. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages 353–362, New York, NY, USA.ACM.
Navarro, C., Hitschfeld-Kahler, N., and Mateu, L. (2014). A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures. Communications in Computational Physics, 15(2), 285-329.
NVIDIA. (2018) “Cuda C Programming Guide”, https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html, September.
Zhou, Z.-H. and Feng, J. (2017). Deep forest: Towards an alternative to deep neural networks. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 3553–3559.