Impulsionando Árvores Extremamente Aleatórias Ensacadas em Paralelo para Classificação de Textos

  • Júlio C. B. Pires IFGoiano
  • Wellington Santos Martins UFG
  • Daniel X. de Sousa IFG-Anápolis

Abstract


The present work proposes the parallelization of BERT, an algorithm that combines boosting with bagging of extremely randomized trees to make automatic classification of textual datasets. Using high dimensionality sets can make the construction of classifiers an onerous task. Parallelism combined with graphics cards can overcome this challenge, since they offer a high processing power, the execution time can decrease considerably.

Keywords: Bagged trees, BERT parallelization, Boosting, graphics cards.

References

Alonso, M. J. (2017) “Introduction to Boosted Trees”, https://blog.bigml.com/2017/03/14/introduction-to-boosted-trees, September.

Campos, R., Canuto, S., Salles, T., de Sá, C. C., and Gonçalves, M. A. (2017). Stacking bagged and boosted forests for effective automated classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pages 105–114, New York, NY, USA. ACM.

Cano, A. (2018). A survey on graphic processing unit computing for large-scale data mining. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 8(1).

Garrido, A. P. (2016) “What is the difference between Bagging and Boosting?”, https://quantdare.com/what-is-the-difference-between-bagging-and-boosting, September.

Grahn, H., Lavesson, N., Lapajne, M. H., Slat, D. (2011). CudaRF: A CUDA-based implementation of random forests. In Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA. 95-101.

Jansson, K., Sundell, H., and Boström, H. (2014). gpurf and gpuert: Efficient and scalable gpu algorithms for decision tree ensembles. In Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW ’14, pages 1612–1621, Washington, DC, USA. IEEE Computer Society.

Marron, D., Bifet, A., and Morales, G. D. F. (2014). Random forests of very fast decision trees on gpu for mining evolving big data streams. In Proceedings of the Twenty-first European Conference on Artificial Intelligence, ECAI’14, pages 615–620, Amsterdam,The Netherlands, The Netherlands. IOS Press.

Mitchell, R. and Frank, E. (2017). Accelerating the xgboost algorithm using gpu computing. PeerJ Computer Science, 3:e127.

Salles, T., Gonçalves, M., Rodrigues, V., and Rocha, L. (2015). Broof: Exploiting out-of-bag errors, boosting and random forests for effective automated classification. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages 353–362, New York, NY, USA.ACM.

Navarro, C., Hitschfeld-Kahler, N., and Mateu, L. (2014). A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures. Communications in Computational Physics, 15(2), 285-329.

NVIDIA. (2018) “Cuda C Programming Guide”, https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html, September.

Zhou, Z.-H. and Feng, J. (2017). Deep forest: Towards an alternative to deep neural networks. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 3553–3559.
Published
2018-08-08
PIRES, Júlio C. B.; MARTINS, Wellington Santos; SOUSA, Daniel X. de. Impulsionando Árvores Extremamente Aleatórias Ensacadas em Paralelo para Classificação de Textos. In: REGIONAL SCHOOL ON INFORMATICS OF GOIÁS (ERI-GO), 2018. , 2018, Goiânia. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 325-330.