Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs

  • Pablo José Pavan
  • Matheus da Silva Serpa
  • Víctor Martínez
  • Edson Luiz Padoin
  • Jairo Panetta
  • Philippe O. A. Navaux

Resumo


Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aiming the manufacture of more energy efficient systems. In this context, we improved the performance and achieved energy efficiency by the development of three different strategies which use the GPU memory subsystem (global-, shared-, and read-only- memory). We also develop two optimizations to use data locality and use of registers of GPU architecture. Our developed optimizations were applied to GPU algorithms for stencil applications achieve a performance improvement of up to 201:5% in K80 and 264:6% in P 100 when used shared memory and read-only cache respectively over the naive version. The computational results have shown that the combination of use read-only memory, the Z-axis internalization of stencil application and reuse of specific architecture registers allow increasing the energy efficiency of up to 255:6% in K80 and 314:8% in P 100.

Publicado
26/07/2018
PAVAN, Pablo José; SERPA, Matheus da Silva; MARTÍNEZ, Víctor; PADOIN, Edson Luiz; PANETTA, Jairo; NAVAUX, Philippe O. A.. Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs. In: WORKSHOP EM DESEMPENHO DE SISTEMAS COMPUTACIONAIS E DE COMUNICAÇÃO (WPERFORMANCE), 17. , 2018, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . ISSN 2595-6167. DOI: https://doi.org/10.5753/wperformance.2018.3348.