Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor

  • Manuel F. Dolz Universitat Jaume I de Castellón
  • Héctor Martínez Universidad de Córdoba
  • Pedro Alonso Universitat Politécnica de Valéncia
  • Enrique S. Quintana-Ortí Universitat Politécnica de Valéncia

Resumo


The convolution operator is a crucial kernel for many computer vision and signal processing applications that rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has received considerable attention in the past few years for a fair range of processor architectures. In this paper, we follow the technology trend toward integrating long SIMD (single instruction, multiple data) arithmetic units into high performance multicore processors to analyse the benefits of this type of hardware acceleration for latency-constrained DL workloads. For this purpose, we implement and optimise for the Fujitsu processor A64FX, three distinct methods for the calculation of the convolution, namely, the lowering approach, a blocked variant of the direct convolution algorithm, and the Winograd minimal filtering algorithm. Our experimental results include an extensive evaluation of the parallel scalability of these three methods and a comparison of their global performance using three popular DL models and a representative dataset.
Palavras-chave: Convolutional neural networks, high performance, SIMD arithmetic units, ARM-based A64FX processor
Publicado
02/11/2022
DOLZ, Manuel F.; MARTÍNEZ, Héctor; ALONSO, Pedro; QUINTANA-ORTÍ, Enrique S.. Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 34. , 2022, Bordeaux/France. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 1-10.