Utilização de aceleradores embarcados de baixo consumo na implementação de sistemas de HPC

Emilio Hoffmann de O.; Jorge Silva Jr.; Edson Padoin; Phillipe Navaux

doi:10.5753/wscad.2015.14288

Emilio Hoffmann de O. UNIJUI
Jorge Silva Jr. UFRGS
Edson Padoin UNIJUI / UFRGS
Phillipe Navaux UFRGS

DOI: https://doi.org/10.5753/wscad.2015.14288

Resumo

Este trabalho tem como objetivo analisar o desempenho e a eﬁciência energética de aceleradores embarcados de baixo consumo para implementação de sistemas de HPC frente às atuais recomendações de consumo estabelecidas. Testes foram realizados utilizando os 3 níveis do benchmark SHOC em 5 aceleradores GPUs convencionais da NVIDIA e em um acelerador de baixo consumo embarcado na placa MPSoC Jetson. Aceleradores convencionais como NVIDIA K80, alcançaram desempenho de até 3750 GFLOPS e eﬁciência energética de 25 GFLOPS/W, enquanto que, o acelerador embarcado de baixo consumo TK1 obteve desempenho de apenas 301 GFLOPS e eﬁciência energética superior, equivalente a 26, 2 GFLOPS/W.

Referências

Bellorini, E. A. and Galante, G. (2009). Resolução do problema de difus˜ao de calor usando gpus. In Escola Regional de Alto Desempenho, volume 9, pages 245–248. ERAD.

Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., et al. (2008). Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Ofce (DARPA IPTO), Tech. Rep, 15.

Danalis, A. e. a. (2010). The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 63–74. ACM.

Dongarra, J., Meuer, H., and Strohmaier, E. (2015). TOP500 Supercomputer Sites.

Feng, W. and Lin, H. (2010). The Green500 List: Year Two. In International Parallel and Distributed Processing Workshops (IPDPSW), Atlanta, Georgia, USA. IEEE.

Gulo, C. A. S. J. (2012). Técnicas de paralelização em gpgpu aplicadas em algoritmo para remoção de ruído multiplicativo. Dissertação (mestrado) Universidade Estadual Paulista, Instituto de Biociências, Letras e Ciências Exatas. http://hdl.handle.net/11449/89336.

Huang, S., Xiao, S., and Feng, W.-c. (2009). On the energy efciency of graphics processing units for scientic computing. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1–8. IEEE.

Jiao, Y., Lin, H., Balaji, P., and Feng, W. Power and performance characterization of computational kernels on the gpu. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom), pages 221–228. IEEE.

Jin, Z. and Yang, X. (2011). A variational model to remove the multiplicative noise in ultrasound images. Journal of Mathematical Imaging and Vision, 39(1):62–74.

Lindholm, E., Nickolls, J., Oberman, S. F., and Montrym, J. (2008). Tesla: A unied graphics and computing architecture. IEEE Micro, pages 39–55.

Liu, W., Du, Z., Xiao, Y., Bader, D., and Xu, C. (2011). A waterfall model to achieve energy efcient tasks mapping for large scale gpu clusters. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pages 82–92. IEEE.

Montblanc Project (2015). European approach towards energy efcient high performance. http://montblanc-project.eu/.

Nvidia (2009). NVIDIA's Next Generation CUDA Compute Architecture: FERMI. http://www.nvidia.com/content/pdf/fermi/whitepaper.pdf.

NVIDIA (2014a). NVIDIA's Next Generation CUDA Compute Architecture: Kepler http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110/210. GK110-GK210-Architecture-Whitepaper.pdf.

NVIDIA Whitepaper: http://www.nvidia.com/content/PDF/tegrawhitepapers/tegra-K1-whitepaper.pdf.

NVIDIA (2015). Publicaç˜oes sobre o produto. http://www.nvidia.com.br/object/tesla product literature br.html.

Padoin, E. L., Pilla, L. L., Boito, F. Z., Kassick, R. V., Velho, P., and Navaux, P. O. A. (2013). Evaluating application performance and energy consumption on hybrid CPU+GPU architecture. Cluster Computing, 16(3):511–525. 10.1007/s10586-0120219-6.

Schäi, B., Przywara, B., Bellosa, F., Bogner, T., Weeren, S., Harrison, R., and Anglade, A. (2009). Energy efficient servers in europe. http://ec.europa.eu/energy/intelligent/projects/sites/ieeprojects/files/projects/documents/e-server e server final publishable report en.pdf.

Zanotto, L., Ferreira, A., and Matsumoto, M. (2012). Arquitetura e Programação de GPU Nvidia. pages 1–7.