%A Ferrari, Victor
%A Araujo, Guido
%D 2024
%T Improving Direct Convolution through Tensor Slicing, Vectorized Packing and ISA Extensions
%K
%X Convolution is one of the most computationally intensive machine learning model operations, usually solved by the traditional Im2Col + BLAS method. This work describes SConv: a novel direct-convolution algorithm to improve upon Im2Col + BLAS by introducing compile-time and execution time components to tile, vectorize and optimize the computation. SConv’s speed-up over an Im2Col + BLAS method based on current BLAS implementations for end-to-end machine-learning model inference is in the range of 11% – 27% for Intel x86 and 11% – 34% for IBM POWER10 architectures. The total convolution speedup for model inference is 13% – 28% on Intel x86 and 23% – 39% on IBM POWER10. SConv also outperforms oneDNN in 6 out of 7 models.
%U https://sol.sbc.org.br/index.php/ctd/article/view/29278
%J Anais do Concurso de Teses e Dissertações (CTD)
%0 Journal Article
%R 10.5753/ctd.2024.2901
%P 148-157%@ 2763-8820
%8 2024-07-21