High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution
Resumo
Deep Neural Network Convolution is often implemented with general matrix multiplication ( GEMM ) using the well-known im2col algorithm. This algorithm constructs a Toeplitz matrix from the input feature maps, and multiplies them by the convolutional kernel. With input feature map dimensions C × H × W and kernel dimensions M × C × K^2, im2col requires O(K^2CHW ) additional space. Although this approach is very popular, there has been little study of the associated design space. We show that the im2col algorithm is just one point in a regular design space of algorithms which translate convolution to GEMM. We enumerate this design space, and experimentally evaluate each algorithmic variant. Our evaluation yields several novel low-memory algorithms which match the performance of the best known approaches despite requiring only a small fraction of the additional memory.
Palavras-chave:
Convolution, Kernel, Layout, Tensors, Software algorithms, Buildings, Two dimensional displays, neural networks, embedded software, performance
Publicado
08/09/2020
Como Citar
ANDERSON, Andrew; VASUDEVAN, Aravind; KEANE, Cormac; GREGG, David.
High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 32. , 2020, Porto/Portugal.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 99-106.
