Partitioning GPUs for Improved Scalability
Abstract
To port applications to GPUs, developers need to express computational tasks as highly parallel executions with tens of thousands of threads to fill the GPU's compute resources. However, while this will fill the GPU's resources, it does not necessarily deliver the best efficiency, as the task may scale poorly when run with sufficient parallelism to fill the GPU. In this work we investigate how we can improve throughput by co-scheduling poorly-scaling tasks on sub-partitions of the GPU to increase utilization efficiency. We first investigate the scalability of typical HPC tasks on GPUs, and then use this insight to improve throughput by extending the StarPU framework to co-schedule tasks on the GPU. We demonstrate that co-scheduling poorly-scaling GPU tasks accelerates the execution of the critical tasks of a Cholesky Factorization and improves the overall performance of the application by 9% across a wide range of block sizes.
Keywords:
Kernel, Graphics processing units, Scalability, Throughput, Runtime, Parallel processing, Instruction sets, GPGPU, Kernel Co-execution, Task Scheduling
Published
2016-10-26
How to Cite
JANZÉN, Johan; BLACK-SCHAFFER, David; HUGO, Andra.
Partitioning GPUs for Improved Scalability. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 28. , 2016, Los Angeles/EUA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2016
.
p. 42-49.
