Avaliação de Estilos de Código para Árvores de Decisão em GPU com Microbenchmarks

Jeronimo Penha; Alysson K. C. da Silva; Olavo Barros; Icaro Moreira; José Augusto M. Nacif; Ricardo Ferreira

doi:10.5753/wscad.2023.235903

Jeronimo Penha UFV
Alysson K. C. da Silva UFV
Olavo Barros UFV
Icaro Moreira UFV
José Augusto M. Nacif UFV
Ricardo Ferreira UFV

DOI: https://doi.org/10.5753/wscad.2023.235903

Resumo

Este trabalho aborda o uso de GPUs para aumentar o desempenho de algoritmos com Florestas Aleatórias (Random Forests). O estudo utiliza microbenchmarks desenvolvidos para a avaliação da implementação de árvores de decisão em GPUs, com a conclusão de que, até a profundidade de 6 níveis, a implementação sem instruções de desvio é mais vantajosa, porém para profundidades maiores, o uso de desvio, mesmo em presença de divergências, é mais indicado. O uso de implementações com memória apresenta perda de desempenho devido às indireções e latência maior que 20 ciclos de leitura em memória. Além disso, verificou-se que mais árvores com uma profundidade menor são mais eficientes do que poucas árvores com maior profundidade.

Referências

Cano, A. (2018). A survey on graphic processing unit computing for large-scale data mining. Wiley Interdisciplinary: Data Mining and Knowledge Discovery.

Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis, 38(4):367–378.

Guan, H., Min, H., Yu, L., and Zou, J. (2023). A comparison of decision forest inference platforms from a database perspective. arXiv:2302.04430.

Guerreiro, J., Ilic, A., Roma, N., and Tomas, P. (2019). Gpu static modeling using ptx and deep structured learning. IEEE Access.

Jansson, K., Sundell, H., and Boström, H. (2014). gpurf and gpuert: efficient and scalable gpu algorithms for decision tree ensembles. In IPDPS.

Jia, Z., Maggioni, M., Smith, J., and Scarpazza, D. P. (2019). Dissecting the nvidia turing t4 gpu via microbenchmarking. arXiv preprint arXiv:1903.07486.

Jo, Y., Goldfarb, M., and Kulkarni, M. (2013). Automatic vectorization of tree traversals. In PACT. IEEE.

Lin, Z., Sinha, S., and Zhang, W. (2019). Towards efficient and scalable acceleration of online decision tree learning on fpga. In IEEE FCCM.

Nadi, A. and Moradi, H. (2019). Increasing the views and reducing the depth in random forest. Expert Systems with Applications, 138:112801.

Nakandala, S. (2020). A tensor compiler for unified machine learning prediction serving. In Symp on Operating Systems Design and Implementation (OSDI).

Prasad, A., Govindarajan, R., and Bondhugula, U. (2022). Treebeard: An optimizing compiler for decision tree based ml inference. In IEEE MICRO.

Van Chu, T., Kitajima, R., Kawamura, K., Yu, J., and Motomura, M. (2021). A high-performance and flexible fpga inference accelerator for decision forests based on prior feature space partitioning. In IEEE ICFPT.

Van Essen, B., Macaraeg, C., Gokhale, M., and Prenger, R. (2012). Accelerating a random forest classifier: Multi-core, gp-gpu, or fpga? In IEEE FCCM.

Volkov, V. (2010). Better performance at lower occupancy. In Proceedings of the GPU technology conference, GTC, volume 10, page 16. San Jose, CA.

Wang, H. and Jin, H. (2022). Hardgbm: A framework for accurate and hardware-efficient gradient boosting machines. IEEE Transaction on CAD.

Xie, Z., Dong, W., Liu, J., Liu, H., and Li, D. (2021). Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on gpu. In Proceedings of the Sixteenth European Conference on Computer Systems, pages 426–440.

Zhang, J. (2022). Rethink decision tree traversal. arXiv preprint arXiv:2209.04825.

Zhu, M., Luo, J., Mao, W., and Wang, Z. (2022). An efficient fpga-based accelerator for deep forest. In ISCAS. IEEE.