A Fast and Generic GPU-Based Parallel Reduction Implementation
ResumoReduction operations are extensively employed in many computational problems, where a finite set of numeric elements are combined into a single value, using for this a combining function. A parallel reduction, in turn, is the operation concurrently performed when multiple execution units are available. The present work depicts a GPU-based parallel approach for it, which employs techniques like loop unrolling, persistent threads and algebraic expressions to avoid thread divergence, and was able to outperform the methods currently in use. Experiments conducted to evaluate the approach show that the strategy performs efficiently on both AMD and NVidia’s hardwares, as well as using OpenCL and CUDA, making it portable.
Palavras-chave: Graphics processing units, Indexes, Synchronization, Hardware, Bandwidth, Proposals, Instruction sets, GPU, Parallel Reduction, Fast, Generic
JRADI, Walid Abdala Rfaei; DO NASCIMENTO, Hugo Alexandre Dantas; MARTINS, Wellington Santos. A Fast and Generic GPU-Based Parallel Reduction Implementation. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (WSCAD), 19. , 2018, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 16-22.