Speeding Up Stencil Computations with Kernel Convolution

  • Guilherme C. Januario IBM T.J. Watson Research Center
  • Bryan S. Rosenburg USP
  • Yoonho Park IBM T.J. Watson Research Center
  • Michael Perrone IBM T.J. Watson Research Center
  • Jose Moreira IBM T.J. Watson Research Center
  • Tereza C. M. B. Carvalho USP

Abstract


A technique to speed up stencil computation is introduced. Computation and data reuse schemes are developed for its application to 1- and 3-dimensional stencils. The approach traverses the data domain fewer times than a state-of-the-art, straightforward iterative stencil implementation would. Performance results are shown for a variety of platforms, exemplifying how it can be straightforwardly applied with existing techniques and frameworks. The technique, named Aggregate Stencil-Loop Iteration (ASLI), works by applying a stencil obtained by the original stencil operator convolved with itself one or more times. This more complex operator creates new opportunities for in-register data reuse and increases the FLOPs-to-load ratio. The total number of FLOPs decreases for 1D but increases for 2D and 3D star-shaped stencils. In both scenarios, speed-up relative to the state-of-the-art is achieved. ASLI is relatively easy to implement and works synergistically with existing methods to optimize stencil computations.
Keywords: Three-dimensional displays, Kernel, Convolution, Mathematical model, Aggregates, Registers, Electronic mail, stencil computation, optimization, high-performance computing, numeric kernels
Published
2016-10-26
JANUARIO, Guilherme C.; ROSENBURG, Bryan S.; PARK, Yoonho; PERRONE, Michael; MOREIRA, Jose; CARVALHO, Tereza C. M. B.. Speeding Up Stencil Computations with Kernel Convolution. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 28. , 2016, Los Angeles/EUA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 76-83.