Implementação CUDA dos Kernels NPB
Resumo
NAS Parallel Benchmarks (NPB) é um conjunto de benchmarks utilizado para avaliar hardware e software, que ao longo dos anos foi portado para diferentes frameworks. Concernente a GPUs, atualmente existem apenas versões OpenCL e OpenACC. Este trabalho contribui com a literatura provendo a primeira implementação CUDA completa dos kernels do NPB, realizando experimentos com carga de trabalho inédita e revelando novos fatos sobre o NPB.
Referências
Griebler, D., Loff, J., Mencagli, G., Danelutto, M., and Fernandes, L. G. (2018). Efficient NAS Benchmark Kernels with C++ Parallel Programming. In 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), PDP’18, pages 733–740, Cambridge, UK. IEEE.
Seo, S., Jo, G., and Lee, J. (2011). Performance Characterization of the NAS Parallel Benchmarks in OpenCL. In 2011 IEEE International Symposium on Workload Characterization (IISWC), pages 137–148.
Tian, X., Xu, R., Yan, Y., Chandrasekaran, S., Eachempati, D., and Chapman, B. (2016). Compiler Transformation of Nested Loops for General Purpose GPUs. Concurrency and Computation: Practice and Experience, 28(2):537–556.
Xu, R., Tian, X., Chandrasekaran, S., Yan, Y., and Chapman, B. (2015). NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model. In Brodman,
J. and Tu, P., editors, Languages and Compilers for Parallel Computing, pages 67–81, Cham. Springer International Publishing.