A Fast and Generic GPU-Based Parallel Reduction Implementation

  • Walid Abdala Rfaei Jradi UFG
  • Hugo Alexandre Dantas do Nascimento UFG
  • Wellington Santos Martins UFG

Resumo

Reduction operations are extensively employed in many computational problems, where a finite set of numeric elements are combined into a single value, using for this a combining function. A parallel reduction, in turn, is the operation concurrently performed when multiple execution units are available. The present work depicts a GPU-based parallel approach for it, which employs techniques like loop unrolling, persistent threads and algebraic expressions to avoid thread divergence, and was able to outperform the methods currently in use. Experiments conducted to evaluate the approach show that the strategy performs efficiently on both AMD and NVidia’s hardwares, as well as using OpenCL and CUDA, making it portable.
Publicado
2018-10-01
Como Citar
JRADI, Walid Abdala Rfaei; DO NASCIMENTO, Hugo Alexandre Dantas; MARTINS, Wellington Santos. A Fast and Generic GPU-Based Parallel Reduction Implementation. Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (SSCAD), [S.l.], p. 16-22, out. 2018. ISSN 0000-0000. Disponível em: <https://sol.sbc.org.br/index.php/sscad/article/view/15636>. Acesso em: 17 maio 2024.