Speeding Up Stencil Computations with Kernel Convolution
Abstract
A technique to speed up stencil computation is introduced. Computation and data reuse schemes are developed for its application to 1- and 3-dimensional stencils. The approach traverses the data domain fewer times than a state-of-the-art, straightforward iterative stencil implementation would. Performance results are shown for a variety of platforms, exemplifying how it can be straightforwardly applied with existing techniques and frameworks. The technique, named Aggregate Stencil-Loop Iteration (ASLI), works by applying a stencil obtained by the original stencil operator convolved with itself one or more times. This more complex operator creates new opportunities for in-register data reuse and increases the FLOPs-to-load ratio. The total number of FLOPs decreases for 1D but increases for 2D and 3D star-shaped stencils. In both scenarios, speed-up relative to the state-of-the-art is achieved. ASLI is relatively easy to implement and works synergistically with existing methods to optimize stencil computations.
Keywords:
Three-dimensional displays, Kernel, Convolution, Mathematical model, Aggregates, Registers, Electronic mail, stencil computation, optimization, high-performance computing, numeric kernels
Published
2016-10-26
How to Cite
JANUARIO, Guilherme C.; ROSENBURG, Bryan S.; PARK, Yoonho; PERRONE, Michael; MOREIRA, Jose; CARVALHO, Tereza C. M. B..
Speeding Up Stencil Computations with Kernel Convolution. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 28. , 2016, Los Angeles/EUA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2016
.
p. 76-83.
