Concurrency and Interference Analysis of Kernels on GPUs

Pablo Carvalho; Lúcia Maria de A. Drummond; Cristiana Bentes

doi:10.5753/ctd.2021.15757

Pablo Carvalho UFF http://orcid.org/0000-0003-1791-4565
Lúcia Maria de A. Drummond UFF
Cristiana Bentes UERJ

DOI: https://doi.org/10.5753/ctd.2021.15757

Resumo

Heterogeneous systems employing CPUs and GPUs are becoming increasingly popular in large-scale data centers and cloud environments. In these platforms, sharing a GPU across different applications is an important feature to improve hardware utilization and system throughput. However, under scenarios where GPUs are competitively shared, some challenges arise. The decision on the simultaneous execution of different kernels is made by the hardware and depends on the kernels resource requirements. Besides that, it is very difficult to understand all the hardware variables involved in the simultaneous execution decisions, in order to describe a formal allocation method. In this work, we studied the impact that kernel resource requirements have in concurrent execution and used machine learning (ML) techniques to infer the interference caused by the concurrent execution, and to classify the slowdown that results from this interference. The ML techniques were analyzed over the GPU benchmark suites, Rodinia, Parboil and SHOC. Our results showed that, from the features selected in the analysis, the number of blocks per grid, number of threads per block, and number of registers are the resource consumption features that most affect the performance of the concurrent execution.

Palavras-chave: GPU, High Performance Computing, Machine Learning

Referências

Carvalho, P., Clua, E., Paes, A., Bentes, C., Lopes, B., and Drummond, L. M. (2020a). Using machine learning techniques to analyze the performance of concurrent kernel execution on gpus. Future Generation Computer Systems, 113(1):528–540.

Carvalho, P., Drummond, L. M., Bentes, C., Clua, E., Cataldo, E., and Marzulo, L. A.(2020b). Kernel concurrency opportunities based on gpu benchmarks characterization. Cluster Computing, 23(1):177–188.

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S.-H., , and Skadron, K.(2009). Rodinia: A benchmark suite for heterogeneous computing.In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), page 44:54.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794. ACM.

Danalis, A., Marin, G., McCurdy, C., Meredith, J. S., Roth, P. C., Spafford, K., Tipparaju,V., and Vetter, J. S. (2010). The scalable heterogeneous computing (SHOC) benchmark suite. Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, page 63:74.

Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182.

Michalski, R. S., Carbonell, J. G., and Mitchell, T. M. (2013).Machine learning: Anartificial intelligence approach. Springer Science & Business Media.

Stratton, J. A., Rodrigues, C., Sung, I.-J., Obeid, N., Chang, L.-W., Anssari, N., Liu,G. D., and mei W. Hwu, W. (2012). Parboil: A revised benchmark suite for scientific and commercial throughput computing.

Zien, A., Krämer, N., Sonnenburg, S., and Rätsch, G. (2009). The feature importance ranking measure. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 694–709. Springer