Análise de desempenho do cálculo matricial em sistemas paralelos utilizando AVX-512

André Libório; Alexandro Baldassin; João Paulo Papa

doi:10.5753/eradsp.2022.222245

André Libório UNESP
Alexandro Baldassin UNESP https://orcid.org/0000-0001-8824-3055
João Paulo Papa UNESP https://orcid.org/0000-0002-6494-7514

DOI: https://doi.org/10.5753/eradsp.2022.222245

Abstract

Due to software optimization processes arising from more recent technologies, this study seeks to analyze the advantage of hardware-based vectorization implementations, i.e., AVX2 and AVX-512, in a matrix multiplication scenario. The results show that vectorization brings very expressive gains, highlighting the AVX-512 advantages.

Keywords: Parallel and Distributed Algorithms, Machine Learning, Data Science, High-Performance Computing

References

Capra, M., Bussolino, B., Marchisio, A., Masera, G., Martina, M., and Shafique, M. (2020). Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead. IEEE Access, 8:225134-225180.

Cornea, M. (2015). Intel avx-512 instructions and their use in the implementation of math functions. Intel Corporation, pages 1-20.

Libório, André e Baldassin, A. (2021). Análise de desempenho do cálculo matricial em sistemas paralelos utilizando openmp. In Anais da XII Escola Regional de Alto Desempenho de São Paulo, pages 13-16. SBC.

M. Müller, B. Supinski, B. C. (2009). Evolving OpenMP in an Age of Extreme Parallelism. Springer.

Rathore, Y. and Kumar, D. (2014). Performance evaluation of matrix multiplication using openmp for single dual and multi-core machines. IOSR Journal of Engineering (IOSRJEN), 4:56-59.

Performance analysis of matrix calculus in parallel systems using AVX-512

Abstract

References

Most read articles by the same author(s)