Emprego da tecnologia AVX-512 para aceleração do algoritmo POPF

  • André Libório UNESP
  • Alexandro Baldassin UNESP
  • João Paulo Papa UNESP

Abstract


With the popularization of the AVX-512 vectoring technology in the last decade, it has become attractive to check its performance in new applications. This article presents a study of the use of the AVX-512 technology in a graph-based machine learning algorithm, Parallel Optimum-Path Forest (POPF). The experiments conducted show a performance gain of up to 64% compared to the original, unvectorized, version and up to 23% against AVX2. The performance gains were more discrete in scenarios where multithreading is also employed, but the AVX-512 still displayed the best results overall.

References

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.

Cardoso, J. M., Coutinho, J. G. F., and Diniz, P. C. (2017). Chapter 2 high-performance embedded computing. In Cardoso, J. M., Coutinho, J. G. F., and Diniz, P. C., editors, Embedded Computing for High Performance, pages 17-56. Morgan Kaufmann, Boston.

Ciresan, D. C., Giusti, A., Gambardella, L. M., and Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In International conference on medical image computing and computer-assisted intervention, pages 411-418. Springer.

Culquicondor, A., Baldassin, A., Castelo-Fernández, C., de Carvalho, J. P., and Papa, J. P. (2020). An efficient parallel implementation for training supervised optimum-path forest classifiers. Neurocomputing, 393:259-268.

Flynn, M. (1966). Very high-speed computing systems. Proceedings of the IEEE, 54(12):1901-1909.

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700-4708.

Kipf, T. N. and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

Kretz, M. (2015). Extending C++ for explicit data-parallel programming via SIMD vector types. PhD thesis, Frankfurt am Main, Johann Wolfgang Goethe-Univ., Diss., 2015.

Kubat, M. (1999). Neural networks: a comprehensive foundation by simon haykin, macmillan, 1994, isbn 0-02-352781-7. The Knowledge Engineering Review, 13(4):409- 412.

Lichman, M. (2013). UCI machine learning repository.

Papa, J. P., Falcao, A. X., and Suzuki, C. T. (2009). Supervised pattern classification based on optimum-path forest. International Journal of Imaging Systems and Technology, 19(2):120-131.

Rathore, Y. and Kumar, D. (2014). Performance evaluation of matrix multiplication using openmp for single dual and multi-core machines. IOSR Journal of Engineering (IOSR-JEN), 4:56-59.

Sze, V., Chen, Y.-H., Emer, J., Suleiman, A., and Zhang, Z. (2017). Hardware for machine learning: Challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference (CICC), pages 1-8. IEEE.

Thearling, K. (1996). Massively parallel architectures and algorithms for time series analysis. Lectures in Complex Systems, Addison-Wesley.
Published
2022-10-19
LIBÓRIO, André; BALDASSIN, Alexandro; PAPA, João Paulo. Emprego da tecnologia AVX-512 para aceleração do algoritmo POPF. In: UNDERGRADUATE RESEARCH WORKSHOP - SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 23. , 2022, Florianópolis. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 33-40. DOI: https://doi.org/10.5753/wscad_estendido.2022.226368.