Optimizing the Use of GPU Memory Subsystems for Stencil-Based Applications

  • Ricardo K. Lorenzoni UNIJUI
  • Matheus S. Serpa UFRGS
  • Edson L. Padoin UNIJUI / UFRGS
  • Jairo Panetta ITA
  • Philippe O. A. Navaux UFRGS
  • Jean-François Méhaut Universite Grenoble Alpes

Abstract


Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aim the manufacture of more energy-efficient systems. In this context, this paper proposes to accelerate performance and increase the energy efficiency of stencil application by optimizing the use of the memory subsystem of GPUs. Our developed GPUoptimized algorithms for stencil applications achieve a performance speedup of up to 2.85 compared with the naive version. The computational results have shown that the combination of the Z-axis internalization of stencil application and the reuse of registers of architecture can achieve about 20.24% of energy saving and an increase of up to 50% in energy efficiency.

References

Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., et al. (2008). Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep, 15:1–297.

de la Cruz, R. and Araya-Polo, M. (2011). Towards a multi-level cache performance model for 3d stencil computation. Procedia Computer Science, 4:2146–2155.

Hamilton, B., Webb, C. J., Gray, A., and Bilbao, S. (2015). Large stencil operations for gpu-based 3-d acoustics simulations. Proc. Digital Audio Effects (DAFx),(Trondheim, Norway), pages 292–299.

Maruyama, N. and Aoki, T. (2014). Optimizing stencil computations for nvidia kepler gpus. In Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Vienna, pages 89–95.

Nasciutti, T. C. and Panetta, J. (2016). Impacto da arquitetura de memória de gpgpus na velocidade da computação de estênceis. XVII Simpósio de Sistemas Computacionais, pages 64–75.
Published
2017-07-02
LORENZONI, Ricardo K.; SERPA, Matheus S.; PADOIN, Edson L.; PANETTA, Jairo; NAVAUX, Philippe O. A.; MÉHAUT, Jean-François. Optimizing the Use of GPU Memory Subsystems for Stencil-Based Applications. In: WORKSHOP ON PERFORMANCE OF COMPUTER AND COMMUNICATION SYSTEMS (WPERFORMANCE), 16. , 2017, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 1768-1773. ISSN 2595-6167. DOI: https://doi.org/10.5753/wperformance.2017.3352.