Simplifying HPC Application Development with OpenMP Cluster and Unified Memory

Jhonatan Cléto; Hervé Yviquel; Marcio M. Pereira; Guido Araújo

doi:10.5753/eradsp.2023.231898

Jhonatan Cléto UNICAMP
Hervé Yviquel UNICAMP
Marcio M. Pereira UNICAMP
Guido Araújo UNICAMP

DOI: https://doi.org/10.5753/eradsp.2023.231898

Resumo

As accelerators such as GPU and FPGA become more common in HPC systems, programming for these systems becomes more challenging due to, for example, the additional layer of memory management. This paper presents an extension to the OpenMP Cluster that integrates CUDA’s Unified Memory management. Evaluation using a synthetic benchmark reveals that while this extension simplifies the development of GPU-based OMPC applications, further optimization is required to reduce its impact on performance.

Referências

LNCC (2023). Santos Dumont (SDumont) Bull Sequana X1000, Xeon Gold 6252 24c 2.1GHz, Mellanox Infiniband EDR, Nvidia Tesla V100 SXM2.

Meuer, H. W., Strohmaier, E., Dongarra, J., and Simon, H. D. (2014). The TOP500: History, Trends, and Future Directions in High Performance Computing. Chapman & Hall/CRC, 1st edition.

Slaughter, E., Wu, W., Fu, Y., Garcia, N., Kautz, W., Marx, E., Morris, K. S., Cao, Q., Bosilca, G., et al. (2020). Task bench: A parameterized benchmark for evaluating parallel runtime performance. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE.

Yviquel, H., Pereira, M., Francesquini, E., Valarini, G., Gustavo Leite, P. R., Ceccato, R., Cusihualpa, C., Dias, V., Rigo, S., Sousa, A., and Araujo, G. (2022). The OpenMP Cluster Programming Model. 51st International Conference on Parallel Processing Workshop Proceedings (ICPP Workshops 22).

Simplifying HPC Application Development with OpenMP Cluster and Unified Memory

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)