High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution

Andrew Anderson; Aravind Vasudevan; Cormac Keane; David Gregg

Andrew Anderson Trinity College Dublin
Aravind Vasudevan Trinity College Dublin
Cormac Keane Trinity College Dublin
David Gregg Trinity College Dublin

Resumo

Deep Neural Network Convolution is often implemented with general matrix multiplication ( GEMM ) using the well-known im2col algorithm. This algorithm constructs a Toeplitz matrix from the input feature maps, and multiplies them by the convolutional kernel. With input feature map dimensions C × H × W and kernel dimensions M × C × K^2, im2col requires O(K^2CHW ) additional space. Although this approach is very popular, there has been little study of the associated design space. We show that the im2col algorithm is just one point in a regular design space of algorithms which translate convolution to GEMM. We enumerate this design space, and experimentally evaluate each algorithmic variant. Our evaluation yields several novel low-memory algorithms which match the performance of the best known approaches despite requiring only a small fraction of the additional memory.

Palavras-chave: Convolution, Kernel, Layout, Tensors, Software algorithms, Buildings, Two dimensional displays, neural networks, embedded software, performance