Execução Energeticamente Eficiente de Aplicações Estêncil com o Processador Manycore MPPA-256

Emmanuel Podestá Jr.; Alyson D. Pereira; Rodrigo C. O. Rocha; Márcio Castro; Luís F. W. Góes

doi:10.5753/wscad.2017.238

Emmanuel Podestá Jr. UFSC
Alyson D. Pereira UFSC
Rodrigo C. O. Rocha PUC Minas
Márcio Castro UFSC
Luís F. W. Góes PUC Minas

DOI: https://doi.org/10.5753/wscad.2017.238

Resumo

Neste artigo é proposta uma adaptação do framework PSkel para o processador manycore de baixa potência MPPA-256. O framework permite simplificar o desenvolvimento de aplicações estêncil iterativas para o MPPA-256, escondendo do desenvolvedor detalhes de implementação. Os resultados obtidos no MPPA-256 mostraram uma redução do consumo de energia de aplicações estêncil iterativas de até 1.45x em comparação com um processador multicore Intel Broadwell.

Referências

Castro, M., Dupros, F., Francesquini, E., Méhaut, J.-F., and Navaux, P. O. A. (2014). Energy efcient seismic wave propagation simulation on a low-power manycore processor. In International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 57–64, Paris, France. IEEE Computer Society.

Castro, M., Francesquini, E., Dupros, F., Aochi, H., Navaux, P. O., and Méhaut, J.-F. (2016). Seismic wave propagation simulations on low-power and performance-centric manycores. Parallel Computing, 54:108–120.

Castro, M., Francesquini, E., Nguélé, T. M., and Méhaut, J.-F. (2013). Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application. In Workshop on Irregular Applications: Architectures & Algorithms (IAˆ3), pages 5:1–5:8, Denver, EUA. ACM.

Cole, M. (2004). Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Comput., 30(3):389–406.

Demmel, J. W. (1997). Applied numerical linear algebra. SIAM.

Enmyren, J. and Kessler, C. W. (2010). SkePU: A Multi-backend Skeleton Programming Library for multi-GPU Systems. In Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, HLPP '10, pages 5–14, New York, NY, USA. ACM.

Francesquini, E., Castro, M., Penna, P. H., Dupros, F., de Freitas, H. C., Navaux, P. O. A., and Méhaut, J.-F. (2014). On the energy efciency and performance of irregular applications on multicore, NUMA and manycore platforms. J. Parallel Distrib. Comput., 76:32–48.

Fu, H., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W., Liu, F., Qiao, F., Zhao, W., Yin, X., Hou, C., Zhang, C., Ge, W., Zhang, J., Wang, Y., Zhou, C., and Yang, G. (2016). The sunway taihulight supercomputer: system and applications. SCIENCE CHINA Information Sciences, 59(7):072001:1–072001:16.

Gardner, M. (1970). Mathematical Games The Fantastic Combinations of John Conway's New Solitaire Game 'Life'. Scientic American, 223(3).

Gonzalez, R. C. and Woods, R. E. (2006). Digital Image Processing (3rd Edition). Prentice-Hall, Inc.

Holewinski, J., Pouchet, L.-N., and Sadayappan, P. (2012). High-Performance Code Generation for Stencil Computations on GPU Architectures. In ACM ICS, pages 311–320.

Kale, L. V. and Bhatele, A., editors (2012). Parallel Science and Engineering Applications: The Charm++ Approach. CRC Press, 1st edition.

Lam, B. C., George, A. D., and Lam, H. (2013). TSHMEM: Shared-Memory Parallel Computing on Tilera Many-Core Processors. In IEEE International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), pages 325– 334, Cambridge, USA. IEEE Computer Society.

Lutz, T., Fensch, C., and Cole, M. (2013). PARTANS: An Autotuning Framework for Stencil Computation on Multi-GPU Systems. ACM Trans. Archit. Code Optim., 9(4):59:1–59:24.

Meng, J. and Skadron, K. (2011). A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations. International Journal of Parallel Programming, 39(1):115–142.

Padoin, E. L., Pilla, L. L., Castro, M., Boito, F. Z., Navaux, P. O. A., and Méhaut, J.-F. (2015). Performance/Energy Trade-off in Scientic Computing: The Case of ARM big.LITTLE and Intel Sandy Bridge. IET Computers & Digital Techniques.

Pereira, A. D., Ramos, L., and Góes, L. F. W. (2015). PSkel: A stencil programming framework for cpu-gpu systems. Concurrency and Computation: Practice and Experience, 27(17):4938–4953.

Rocha, R. C. O., Pereira, A. D., Ramos, L., and Góes, L. F. W. (2017). TOAST: Automatic tiling for iterative stencil computations on GPUs. Concurrency and Computation: Practice and Experience, 29(8):e4053.

Sirdey, P. A., Beaucamps, P.-E., Blanc, F., Bobin, B., Carpov, S., Cudennec, L., David, V., Dore, P., Dubrulle, P., de Dinechin, B. D., François Galea, Goubier, T., and Harrand, M. (2013). Extended Cyclostatic Dataow Program Compilation and Execution for In International Conference on Computational an Integrated Manycore Processor. Science (ICCS), volume 18, pages 1624–1633, Barcelona, Spain. Elsevier Science.

Steuwer, M., Kegel, P., and Gorlatch, S. (2011). SkelCL A Portable Skeleton Library for High-Level GPU Programming. In Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW '11, pages 1176–1182, Washington, DC, USA. IEEE Computer Society.

Totoni, E., Behzad, B., Ghike, S., and Torrellas, J. (2012). Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 78–87, New Brunswick, Canada. IEEE Computer Society.

Varghese, A., Edwards, B., Mitra, G., and Rendell, A. P. (2014). Programming the Adapteva Epiphany 64-core network-on-chip coprocessor. In International Parallel Distributed Processing Symposium Workshops (IPDPSW), pages 984–992, Phoenix, USA. IEEE Computer Society.

Weaver, V. M., Johnson, M., Kasichayanula, K., Ralph, J., Luszczek, P., Terpstra, D., and Moore, S. (2012). Measuring energy and power with PAPI. In 2012 41st International Conference on Parallel Processing Workshops, pages 262–268.