High Performance and Portable Convolution Operators for Multicore Processors

  • Pablo San Juan Universitat Politècnica de Valéncia
  • Adrián Castelló Universitat Jaume 1
  • Manuel F. Dolz Universitat Jaume 1
  • Pedro Alonso-Jordá Universitat Politècnica de Valéncia
  • Enrique S. Quintana-Ortí Universitat Politècnica de Valéncia

Resumo


The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS.
Palavras-chave: Program processors, Kernel, Convolution, Transforms, Multicore processing, Libraries, Linear algebra, Convolutional neural networks, high performance, multicore processors
Publicado
08/09/2020
JUAN, Pablo San; CASTELLÓ, Adrián; DOLZ, Manuel F.; ALONSO-JORDÁ, Pedro; QUINTANA-ORTÍ, Enrique S.. High Performance and Portable Convolution Operators for Multicore Processors. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 32. , 2020, Porto/Portugal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 91-98.