Optimizing the Use of GPU Memory Subsystems for Stencil-Based Applications
Abstract
Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aim the manufacture of more energy-efficient systems. In this context, this paper proposes to accelerate performance and increase the energy efficiency of stencil application by optimizing the use of the memory subsystem of GPUs. Our developed GPUoptimized algorithms for stencil applications achieve a performance speedup of up to 2.85 compared with the naive version. The computational results have shown that the combination of the Z-axis internalization of stencil application and the reuse of registers of architecture can achieve about 20.24% of energy saving and an increase of up to 50% in energy efficiency.
References
de la Cruz, R. and Araya-Polo, M. (2011). Towards a multi-level cache performance model for 3d stencil computation. Procedia Computer Science, 4:2146–2155.
Hamilton, B., Webb, C. J., Gray, A., and Bilbao, S. (2015). Large stencil operations for gpu-based 3-d acoustics simulations. Proc. Digital Audio Effects (DAFx),(Trondheim, Norway), pages 292–299.
Maruyama, N. and Aoki, T. (2014). Optimizing stencil computations for nvidia kepler gpus. In Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Vienna, pages 89–95.
Nasciutti, T. C. and Panetta, J. (2016). Impacto da arquitetura de memória de gpgpus na velocidade da computação de estênceis. XVII Simpósio de Sistemas Computacionais, pages 64–75.
