Mecanismo para reduzir o desperdício energético na pós-execução de aplicações em GPU

  • Emmanuell Carreño UFRGS
  • Adiel Sarates Jr. UFRGS
  • Philippe Navaux UFRGS

Abstract

With the increase demand of GPU accelerators for general purpose processing in HPC the impact of power consumption of these resources cannot be overlooked. To reduce the power consumption some strategies have been applied, but their approaches have been mostly focused in energy savings during the application execution. This work is focused in the post-execution energy saving. When the post-execution behavior of the applications is analyzed in newer GPU cards, it is observed that the power draw do not return to idle state in an efficient way, creating an unexpected power waste. For this inefficient process to return to the idle power draw and the power waste in the post-execution, it was developed a strategy to reduce this waste with a minimal impact in global performance. With this strategy, energy savings up to 15% in the post-execution stage in sequential application runs were obtained. In the case of a single execution our approach allows savings up to 59% in the post-execution power consumption.

References

Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., and Tomov, S. (2009). Numerical linear algebra on emerging architectures: The plasma and magma projects. In Journal of Physics: Conference Series, volume 180, page 012037. IOP Publishing.

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S.-H., and Skadron, K. (2009). Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44–54. IEEE.

Che, S., Sheaffer, J. W., Boyer, M., Szafaryn, L. G., Wang, L., and Skadron, K. (2010). A characterization of the rodinia benchmark suite with comparison to contemporary cmp workloads. In Workload Characterization (IISWC), 2010 IEEE International Symposium on, pages 1–11. IEEE.

Chen, J., Li, B., Zhang, Y., Peng, L., and Peir, J.-k. (2011). Statistical gpu power analysis using tree-based methods. In Green Computing Conference and Workshops (IGCC), 2011 International, pages 1–6. IEEE.

Ge, R., Vogt, R., Majumder, J., Alam, A., Burtscher, M., and Zong, Z. (2013). Effects of dynamic voltage and frequency scaling on a k20 gpu. In 2nd International Workshop on Power-aware Algorithms, Systems, and Architectures.

Haidar, A., Tomov, S., Yamazaki, I., Solca, R., Schulthess, T., Dong, T., and Dongarra, J. (2008). Magma: A breakthrough in solvers for eigenvalue problems. http://icl. utk.edu/magma/.

Huang, S., Xiao, S., and Feng, W.-c. (2009). On the energy efciency of graphics processing units for scientic computing. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1–8. IEEE.

Kasichayanula, K., Terpstra, D., Luszczek, P., Tomov, S., Moore, S., and Peterson, G. D. (2012). Power aware computing on gpus. In Application Accelerators in High Performance Computing (SAAHPC), 2012 Symposium on, pages 64–73. IEEE.

Koomey, J. (2011). Growth in data center electricity use 2005 to 2010. Oakland, CA: Analytics Press. August, 1:2010.

Lee, J., Sathisha, V., Schulte, M., Compton, K., and Kim, N. S. (2011). Improving throughput of power-constrained gpus using dynamic voltage/frequency and core scaling. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 111–120. IEEE.

Ma, K., Li, X., Chen, W., Zhang, C., and Wang, X. (2012). Greengpu: A holistic approach to energy efciency in gpu-cpu heterogeneous architectures. In Parallel Processing (ICPP), 2012 41st International Conference on, pages 48–57. IEEE.

Mei, X., Yung, L. S., Zhao, K., and Chu, X. (2013). A measurement study of gpu dvfs on energy conservation. In Proceedings of the Workshop on Power-Aware Computing and Systems, page 10. ACM.

NVIDIA (2012). Whitepaper nvidia’s next generation cuda compute architecture: Kepler gk110. http://www.nvidia.com/content/PDF/kepler/ NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.

NVIDIA (2013). Nvidia management library (nvml). https://developer. nvidia.com/nvidia-management-library-nvml.

Padoin, E. L., Pilla, L. L., Boito, F. Z., Kassick, R. V., Velho, P., and Navaux, P. O. (2012). Evaluating application performance and energy consumption on hybrid cpu+ gpu architecture. Cluster Computing, pages 1–15.

Tiwari, A., Laurenzano, M., Peraza, J., Carrington, L., and Snavely, A. (2012). Green queue: Customized large-scale clock frequency scaling. In Cloud and Green Computing (CGC) , 1-3 November , 2012, Xiangtan, Hunan, China. IEEE, IEEE.

Top500.org (2013). China’s tianhe-2 supercomputer maintains top spot on 42nd top500 list. http://www.top500.org/blog/lists/2013/11/press-release/.

Ukidave, Y. and Kaeli, D. R. (2013). Analyzing optimization techniques for power efIn Parallel and Distributed Processing Symciency on heterogeneous platforms. posium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, pages 1040–1049. IEEE.

Ukidave, Y., Ziabari, A., Mistry, P., Schirner, G., and Kaeli, D. (2013). Quantifying the energy efciency of fft on heterogeneous platforms. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on, pages 235– 244.

UTK, I. C. L. (2008). Matrix algebra on gpu and multicore architectures. http://icl.utk.edu/magma/.
Published
2014-10-08
How to Cite
CARREÑO, Emmanuell; SARATES JR., Adiel; NAVAUX, Philippe. Mecanismo para reduzir o desperdício energético na pós-execução de aplicações em GPU. Proceedings of the Symposium on High Performance Computing Systems (SSCAD), [S.l.], p. 123-134, oct. 2014. ISSN 0000-0000. Available at: <https://sol.sbc.org.br/index.php/sscad/article/view/15005>. Date accessed: 18 may 2024. doi: https://doi.org/10.5753/wscad.2014.15005.