Maximizando o Uso dos Recursos de GPU Através da Reordenação da Submissão de Kernels Concorrentes

  • Bernardo Breder UFF
  • Eduardo Charles UFF
  • Rommel Cruz UFF
  • Esteban Clua UFF
  • Cristiana Bentes UERJ
  • Lucia Drummond UFF

Abstract


The increasing amount of resources available on current GPUs sparked new interest in the problem of sharing its resources by different kernels. While new generation of GPUs support concurrent kernel execution, their scheduling decisions are taken by the hardware at runtime. The hardware decisions, however, heavily depend on the order at which the kernels are submitted to execution. In this work, we propose a novel optimization approach to reorder the kernels invocation focusing on maximizing the resources utilization, improving the average turnaround time. We model the kernels assignments to the hardware resources as a series of knapsack problems and use dynamic programming approach to solve them. We evaluate our method using kernels with different sizes and resource requirements. Our results show significant gains in the average turnaround time and system throughput compared to the standard kernels submission implemented in modern GPUs.

References

Adriaens, J. T., Compton, K., Kim, N. S., and Schulte, M. J. (2012). The case for GPGPU In IEEE 18th International Symposium on High Performance spatial multitasking. Computer Architecture (HPCA), pages 1–12. IEEE.

Choi, H. J., Son, D. O., Kang, S. G., Kim, J. M., Lee, H.-H., and Kim, C. H. (2013). An efcient scheduling scheme using estimated execution time for heterogeneous computing systems. The Journal of Supercomputing, 65(2):886–902.

Eyerman, S. and Eeckhout, L. (2008). System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42–53.

Gregg, C., Dorn, J., Hazelwood, K., and Skadron, K. (2012). Fine-grained resource sharing for concurrent GPGPU kernels. In Presented as part of the 4th USENIX Workshop on Hot Topics in Parallelism.

Li, T., Narayana, V. K., and El-Ghazawi, T. (2015). A power-aware symbiotic scheduling algorithm for concurrent GPU kernels. In The 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS).

Liang, Y., Huynh, P., Rupnow, K., Goh, R., and Chen, D. (2015). Efcient GPU spatialtemporal multitasking. IEEE Trans. on Parallel and Distributed Systems, 26:748–760.

Lopez-Novoa, U., Mendiburu, A., and Miguel-Alonso, J. (2015). A survey of perforIEEE mance modeling and simulation techniques for accelerator-based computing. Transactions on Parallel and Distributed Systems, 26(1):272–281.

Martello, S. and Toth, P. (1990). Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc.

NVIDIA (2016). Cuda Proler. http://docs.nvidia.com/cuda/proler-users-guide.

Pai, S., Thazhuthaveetil, M. J., and Govindarajan, R. (2013). Improving GPGPU concurrency with elastic kernels. In ACM SIGPLAN Notices, volume 48, pages 407–418.

Peters, H., Koper, M., and Luttenberger, N. (2010). Efciently using a CUDA-enabled In IEEE 10th International Conference on Computer and

GPU as shared resource. Information Technology (CIT), pages 1122–1127. IEEE.

Ravi, V. T., Becchi, M., Agrawal, G., and Chakradhar, S. (2011). Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework. In Proceedings of the 20th international symposium on High performance distributed computing, pages 217–228. ACM.

Wang, L., Huang, M., and El-Ghazawi, T. (2011). Exploiting concurrent kernel execution on graphic processing units. In International Conference on High Performance Computing and Simulation (HPCS), pages 24–32. IEEE.

Wende, F., Cordes, F., and Steinke, T. (2012). On improving the performance of multithreaded CUDA applications with concurrent kernel execution by kernel reordering. In Symp. on Application Accelerators in High Performance Computing (SAAHPC),74-83.

Zhong, J. and He, B. (2014). Kernelet: High-throughput GPU kernel executions with dynamic slicing and scheduling. IEEE Transactions on Parallel and Distributed Systems, 25(6):1522–1532.
Published
2016-10-05
BREDER, Bernardo; CHARLES, Eduardo; CRUZ, Rommel; CLUA, Esteban; BENTES, Cristiana; DRUMMOND, Lucia. Maximizando o Uso dos Recursos de GPU Através da Reordenação da Submissão de Kernels Concorrentes. In: BRAZILIAN SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 17. , 2016, Aracajú. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 251-262. DOI: https://doi.org/10.5753/wscad.2016.14264.