Uma API em linguagem C++ para programas com laços paralelos e suporte a multi-CPUs e multi-GPUs

  • Daniel Di Domenico UFSM
  • João Lima UFSM

Resumo


Este artigo apresenta uma API C++ de alto nível para a implementação de programas paralelos utilizando laços e reduções. Ele visa suprir a falta de APIs que suportam a construção de aplicações que possam ser processadas simultaneamente em multi-CPUs e multi-GPUs. A hipótese levantada estima que aplicações científicas podem valer-se do processamento heterogêneo em multi-CPUs e multi-GPUs para alcançar um desempenho superior em relação ao uso de apenas um acelerador. Os resultados obtidos a partir de experimentos com mini-aplicações científicas desenvolvidas utilizando a nova API sugerem que o processamento combinando CPUs e GPUs pode trazer ganhos de desempenho.

Referências

Adcock, A. B., Sullivan, B. D., Hernandez, O. R., and Mahoney, M. W. (2013). Eva- luating OpenMP Tasking at Scale for the Computation of Graph Hyperbolicity. In Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 9th International Workshop on OpenMP, IWOMP 2013, pages 71–83, Canberra, ACT, Australia. Springer Berlin Heidelberg.

Augonnet, C., Thibault, S., Namyst, R., and Wacrenier, P.-A. (2011). StarPU: a unied platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2):187–198.

Broquedis, F., Gautier, T., and Danjean, V. (2012). libKOMP, an Efcient OpenMP Run- time System for Both Fork-Join and Data Flow Paradigms. In Proc. of the OpenMP in a Heterogeneous World - 8th IWOMP, pages 102–115, Rome, Italy.

Bueno, J., Martorell, X., Badia, R. M., Ayguadé, E., and Labarta, J. (2013). Implementing OmpSs Support for Regions of Data in Architectures with Multiple Address Spaces. In Proceedings of the 27th International Conference on Supercomputing, ICS '13, pages 359–368, Eugene, Oregon, USA. ACM.

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S. H., and Skadron, K. (2009). Rodinia: A benchmark suite for heterogeneous computing. In Workload Cha- racterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44–54.

Duran, A., Ayguadé, E., Badia, R. M., Labarta, J., Martinell, L., Martorell, X., and Planas, J. (2011). Ompss: a Proposal for Programming Heterogeneous Multi-Core Architec- tures. Parallel Processing Letters, 21(2):173–193.

Duran, A., Teruel, X., Ferrer, R., Martorell, X., and Ayguade, E. (2009). Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Paral- lelism in OpenMP. In International Conference on Parallel Processing, 2009. ICPP '09, pages 124–131.

Edwards, H. C., Sunderland, D., Porter, V., Amsler, C., and Mish, S. (2012). Manycore performance-portability: Kokkos multidimensional array library. Scientic Program- ming, 20(2):89–114.

Edwards, H. C., Trott, C. R., and Sunderland, D. (2014). Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Pa- rallel and Distributed Computing, 74(12):3202 – 3216. Domain-Specic Languages and High-Level Frameworks for High-Performance Computing.

Garland, M., Kudlur, M., and Zheng, Y. (2012). Designing a Unied Programming Model In SC '12: Proc. Conference on High Performance for Heterogeneous Machines. Computing Networking, Storage and Analysis.

Gautier, T., Lima, J. V. F., Maillard, N., and Rafn, B. (2013). XKaapi: A Runtime System In Proceedings for Data-Flow Task Programming on Heterogeneous Architectures. of the 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '13, pages 1299–1308, Washington, DC, USA. IEEE Computer Society.

Gregory, K. and Miller, A. (2012). C++ AMP: Accelerated Massive Parallelism with Microsoft R(cid:13) Visual C++ R(cid:13). Developer Reference. Microsoft Press.

Heller, T., Kaiser, H., and Iglberger, K. (2013). Application of the ParalleX Execution Model to Stencil-based Problems. Comput. Sci., 28(2-3):253–261.

Hugo, A.-E., Guermouche, A., Wacrenier, P.-A., and Namyst, R. (2013). Composing Multiple StarPU Applications over Heterogeneous Machines: A Supervised Approach. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pages 1050–1059.

OpenMP (2016). OpenMP Application Program Interface Version 4.5. http://www.openmp.org/mp-documents/openmp-4.5.pdf. Acesso em: 19 jul 2016.

Stroustrup, B. (2013). The C++ Programming Language. Addison-Wesley Professional, 4th edition.

Thrust (2016). http://thrust.github.io/ Acesso em: 21 mai 2016.

Virouleau, P., Broquedis, F., Gautier, T., and Rastello, F. (2016). Using data dependen- cies to improve task-based scheduling strategies on NUMA architectures. In Euro-Par 2016, Euro-Par 2016, Grenoble, France.
Publicado
05/10/2016
DI DOMENICO, Daniel; LIMA, João. Uma API em linguagem C++ para programas com laços paralelos e suporte a multi-CPUs e multi-GPUs. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 17. , 2016, Aracajú. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 85-96. DOI: https://doi.org/10.5753/wscad.2016.14250.