Partitioning GPUs for Improved Scalability

Johan Janzén; David Black-Schaffer; Andra Hugo

Johan Janzén Uppsala University
David Black-Schaffer Uppsala University
Andra Hugo Uppsala University

Resumo

To port applications to GPUs, developers need to express computational tasks as highly parallel executions with tens of thousands of threads to fill the GPU's compute resources. However, while this will fill the GPU's resources, it does not necessarily deliver the best efficiency, as the task may scale poorly when run with sufficient parallelism to fill the GPU. In this work we investigate how we can improve throughput by co-scheduling poorly-scaling tasks on sub-partitions of the GPU to increase utilization efficiency. We first investigate the scalability of typical HPC tasks on GPUs, and then use this insight to improve throughput by extending the StarPU framework to co-schedule tasks on the GPU. We demonstrate that co-scheduling poorly-scaling GPU tasks accelerates the execution of the critical tasks of a Cholesky Factorization and improves the overall performance of the application by 9% across a wide range of block sizes.

Palavras-chave: Kernel, Graphics processing units, Scalability, Throughput, Runtime, Parallel processing, Instruction sets, GPGPU, Kernel Co-execution, Task Scheduling