Structured platform-aware programming
Resumo
Platform-aware programming is a usual practice of HPC performance engineering programmers that is becoming more challenging due to the increasing heterogeneity of parallel computing platforms. In this paper, it is proposed a structured approach to platform-aware programming based on three concepts: platform typing, multiple dispatch, and feature detection. It has been implemented and evaluated through a proof-of-concept prototype built in Julia. It is evidenced that structured platform-aware programming provides better modularity and ease of maintenance with minor performance overhead.
Referências
Bezanson, J., Edelman, A., Karpinski, S., and Shah, V. B. (2017). Julia: A Fresh Approach to Numerical Computing. SIAM Review, 59(1):65–98.
Carneiro, T., Melab, N., Hayashi, A., and Sarkar, V. (2021). Towards Chapel-based Exascale Tree Search Algorithms: dealing with multiple GPU accelerators. In The 18th International Conference on High Performance Computing & Simulation.
De Wael, M., Marr, S., De Fraine, B., Van Cutsem, T., and De Meuter, W. (2015). Partitioned Global Address Space Languages. ACM Computing Surveys, 47(4).
Ernstsson, A. and Kessler, C. (2020). Parallel Computing: Technology Trends, volume 36 of Advances in Parallel Computing, chapter Multi-Variant User Functions for Platform-Aware Skeleton Programming, pages 475–484. IOS Press, Amsterdam.
Hijma, P., Heldens, S., Sclocco, A., van Werkhoven, B., and Bal, H. E. (2022). Optimization Techniques for GPU Programming. ACM Computing Surveys.
Hoffimann, J., Scheidt, C., Barfod, A., and Caers, J. (2017). Stochastic Simulation by Image Quilting of Process-based Geological Models. Computers & Geosciences, 106:18–32.
Kwack, J., Tramm, J., Bertoni, C., Ghadar, Y., Homerding, B., Rangel, E., Knight, C., and Parker, S. (2021). Evaluation of Performance Portability of Applications and Mini-Apps across AMD, Intel and NVIDIA GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pages 45–56.
Muschevici, R., Potanin, A., Tempero, E., and Noble, J. (2008). Multiple Dispatch in Practice. SIGPLAN Notices, 43(10):563–582.
Nardelli, F. Z., Belyakova, J., Pelenitsyn, A., Chung, B., Bezanson, J., and Vitek, J. (2018). Julia Subtyping: A Rational Reconstruction. Proceedings of the ACM on Programming Languages, 2(OOPSLA).
Nieplocha, J., Harrison, R. J., and Littlefield, R. J. (1996). Global Arrays: A Non-Uniform-Memory-Access Programming Model for High-Performance Computers. The Journal of Supercomputing, 10(2):169–189.
Park, S., Latifi, S., Park, Y., Behroozi, A., Jeon, B., and Mahlke, S. (2022). SRTuner: Effective Compiler Optimization Customization by Exposing Synergistic Relations. In 20th IEEE/ACM International Symposium on Code Generation and Optimization, CGO’22, pages 118––130. IEEE Press.
Pierce, B. (1991). Basic Category Theory for Computer Scientists. The MIT Press.
Rocki, K., Burtscher, M., and Suda, R. (2014). The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both? In 29th Annual ACM Symposium on Applied Computing, SAC ’14, pages 886––895, New York, NY, USA. ACM.
The Rust RFC Book (2017). RFC 2045 - Target Features.