Simplifying HPC Application Development with OpenMP Cluster and Unified Memory

  • Jhonatan Cléto UNICAMP
  • Hervé Yviquel UNICAMP
  • Marcio M. Pereira UNICAMP
  • Guido Araújo UNICAMP


As accelerators such as GPU and FPGA become more common in HPC systems, programming for these systems becomes more challenging due to, for example, the additional layer of memory management. This paper presents an extension to the OpenMP Cluster that integrates CUDA’s Unified Memory management. Evaluation using a synthetic benchmark reveals that while this extension simplifies the development of GPU-based OMPC applications, further optimization is required to reduce its impact on performance.


LNCC (2023). Santos Dumont (SDumont) Bull Sequana X1000, Xeon Gold 6252 24c 2.1GHz, Mellanox Infiniband EDR, Nvidia Tesla V100 SXM2.

Meuer, H. W., Strohmaier, E., Dongarra, J., and Simon, H. D. (2014). The TOP500: History, Trends, and Future Directions in High Performance Computing. Chapman & Hall/CRC, 1st edition.

Slaughter, E., Wu, W., Fu, Y., Garcia, N., Kautz, W., Marx, E., Morris, K. S., Cao, Q., Bosilca, G., et al. (2020). Task bench: A parameterized benchmark for evaluating parallel runtime performance. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE.

Yviquel, H., Pereira, M., Francesquini, E., Valarini, G., Gustavo Leite, P. R., Ceccato, R., Cusihualpa, C., Dias, V., Rigo, S., Sousa, A., and Araujo, G. (2022). The OpenMP Cluster Programming Model. 51st International Conference on Parallel Processing Workshop Proceedings (ICPP Workshops 22).
CLÉTO, Jhonatan; YVIQUEL, Hervé; PEREIRA, Marcio M.; ARAÚJO, Guido. Simplifying HPC Application Development with OpenMP Cluster and Unified Memory. In: ESCOLA REGIONAL DE ALTO DESEMPENHO DE SÃO PAULO (ERAD-SP), 14. , 2023, São José dos Campos/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 25-28. DOI: