Emprego da tecnologia AVX-512 para aceleração do algoritmo POPF

André Libório; Alexandro Baldassin; João Paulo Papa

doi:10.5753/wscad_estendido.2022.226368

André Libório UNESP
Alexandro Baldassin UNESP
João Paulo Papa UNESP

DOI: https://doi.org/10.5753/wscad_estendido.2022.226368

Resumo

Com a popularização da tecnologia de vetorização AVX-512 na última década, tornou-se interessante verificar seu desempenho em novas aplicações. Este artigo apresenta um estudo sobre o uso da tecnologia AVX512 em um algoritmo de aprendizado de máquina baseado em grafos, o Parallel Optimum-Path Forest (POPF). Os experimentos conduzidos mostram um ganho de desempenho de até 64% em relação à versão original, sem vetorização, e até 23% em relação ao AVX2. Os ganhos em desempenho foram mais discretos em cenários em que multithreading também é utilizado, mas mesmo assim a versão com AVX-512 atingiu os melhores resultados no geral.

Referências

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.

Cardoso, J. M., Coutinho, J. G. F., and Diniz, P. C. (2017). Chapter 2 high-performance embedded computing. In Cardoso, J. M., Coutinho, J. G. F., and Diniz, P. C., editors, Embedded Computing for High Performance, pages 17-56. Morgan Kaufmann, Boston.

Ciresan, D. C., Giusti, A., Gambardella, L. M., and Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In International conference on medical image computing and computer-assisted intervention, pages 411-418. Springer.

Culquicondor, A., Baldassin, A., Castelo-Fernández, C., de Carvalho, J. P., and Papa, J. P. (2020). An efficient parallel implementation for training supervised optimum-path forest classifiers. Neurocomputing, 393:259-268.

Flynn, M. (1966). Very high-speed computing systems. Proceedings of the IEEE, 54(12):1901-1909.

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700-4708.

Kipf, T. N. and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

Kretz, M. (2015). Extending C++ for explicit data-parallel programming via SIMD vector types. PhD thesis, Frankfurt am Main, Johann Wolfgang Goethe-Univ., Diss., 2015.

Kubat, M. (1999). Neural networks: a comprehensive foundation by simon haykin, macmillan, 1994, isbn 0-02-352781-7. The Knowledge Engineering Review, 13(4):409- 412.

Lichman, M. (2013). UCI machine learning repository.

Papa, J. P., Falcao, A. X., and Suzuki, C. T. (2009). Supervised pattern classification based on optimum-path forest. International Journal of Imaging Systems and Technology, 19(2):120-131.

Rathore, Y. and Kumar, D. (2014). Performance evaluation of matrix multiplication using openmp for single dual and multi-core machines. IOSR Journal of Engineering (IOSR-JEN), 4:56-59.

Sze, V., Chen, Y.-H., Emer, J., Suleiman, A., and Zhang, Z. (2017). Hardware for machine learning: Challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference (CICC), pages 1-8. IEEE.

Thearling, K. (1996). Massively parallel architectures and algorithms for time series analysis. Lectures in Complex Systems, Addison-Wesley.