BinLPT: A Novel Workload-Aware Loop Scheduler for Irregular Parallel Loops

  • Pedro Henrique Penna UFSC / PUC Minas / Univ. Grenoble Alpes
  • Márcio Castro UFSC
  • Patricia Plentz UFSC
  • Henrique C. Freitas PUC Minas
  • François Broquedis Univ. Grenoble Alpes
  • Jean-François Méhaut Univ. Grenoble Alpes

Resumo


Workload-aware loop schedulers were introduced to deliver better performance than classical strategies, but they present limitations on workload estimation, chunk scheduling and integrability with applications. Targeting these challenges, in this work we propose a novel workload-aware loop scheduler that is called BinLPT and it is based on three features. First, it relies on some user-supplied estimation of the workload of the target parallel loop. Second, BinLPT uses a greedy bin packing heuristic to adaptively partition the iteration space in several chunks. The maximum number of chunks to be produced is a parameter that may be fine-tuned. Third, it schedules chunks of iterations using a hybrid scheme based on the LPT rule and on-demand scheduling. We integrated BinLPT in OpenMP, and we evaluated its performance in a large-scale NUMA machine using a synthetic kernel and 3D N-Body Simulations. Our results revealed that BinLPT improves performance over OpenMP’s strategies by up to 45.13% and 37.15% in the synthetic and application kernels, respectively.

##submission.authorBiography##

##submission.authorWithAffiliation##

 

 

Referências

Balasubramaniam, M., Sukhija, N., Ciorba, F., Banicescu, I., and Srivastava, S. (2012). Towards the Scalability of Dynamic Loop Scheduling Techniques via Discrete Event Simulation. In International Parallel and Distributed Processing Symp. Workshops, pages 1343–1351.

Banicescu, I. (2003). On the Scalability of Dynamic Scheduling Scientic Applications with Adaptive Weighted Factoring. Journal of Cluster Computing, 6(3):215–226.

Banicescu, I. and Velusamy, V. (2001). Performance of scheduling scientic applications with adaptive weighted factoring. In International Parallel and Distributed Processing Symp., pages 791–801.

Bull, J. M. (1998). Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments. In International European Conf. on Parallel and Distributed Computing, pages 377–382.

Cariño, R. and Banicescu, I. (2008). Dynamic load balancing with adaptive factoring methods in scientic applications. Journal of Supercomputing, 44(1):41–63.

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S. H., and Skadron, K. (2009). Rodinia: A benchmark suite for heterogeneous computing. In International Symp. on Workload Characterization, pages 44–54.

Fang, Z., Tang, P., Yew, P.-C., and Zhu, C.-Q. (1990). Dynamic Processor Self-Scheduling for General Parallel Nested Loops. In IEEE Transactions on Computers, volume 39, pages 919–929.

Graham, R. (1969). Bounds on Multiprocessing Timing Anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429.

Hummel, S., Schonberg, E., and Flynn, L. (1992). Factoring: a method for scheduling parallel loops. Communications of the ACM, 35(8):90–101.

Hurson, A., Lim, J., Kavi, K., and Lee, B. (1997). Parallelization of DOALL and DOACROSS Loops A Survey. Advances in Computers, 45:53–103.

Kejariwal, A., Nicolau, A., and Polychronopoulos, C. (2006). History-Aware SelfScheduling. In International Conf. on Parallel Processing, pages 185–192.

Luke, E. A., Banicescu, I., and Li, J. (1998). The optimal effectiveness metric for parallel application analysis. Information Processing Letters, 66(5):223–229.

Markatos, E. and Le Blanc, T. (1994). Using Processor Afnity in Loop Scheduling on Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 5(4):379–400.

Penna, P. H., Castro, M., Freitas, H., Broquedis, F., and Méhaut, J. (2016). Design Methodology for Workload-Aware Loop Scheduling Strategies Based on Genetic Algorithm and Simulation. Concurrency and Computation: Practice and Experience.

Polychronopoulos, C. and Kuck, D. (1987). Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers, C-36(12):1425–1439.

Springel, V., White, S. D. M., Jenkins, A., Frenk, C. S., Yoshida, N., Gao, L., Navarro, J., Thacker, R., Croton, D., Helly, J., Peacock, J. A., Cole, S., Thomas, P., Couchman, H., Evrard, A., Colberg, J., and Pearce, F. (2005). Simulations of the formation, evolution and clustering of galaxies and quasars. Nature, 435(7042):629–636.

Wang, Y., Ji, W., Shi, F., Zuo, Q., and Deng, N. (2012). Knowledge-Based Adaptive Self-Scheduling. In International Conf. on Network and Parallel Computing, number 60973010 in Lecture Notes in Computer Science, pages 22–32.
Publicado
17/10/2017
HENRIQUE PENNA, Pedro; CASTRO, Márcio; PLENTZ, Patricia; C. FREITAS, Henrique; BROQUEDIS, François; MÉHAUT, Jean-François. BinLPT: A Novel Workload-Aware Loop Scheduler for Irregular Parallel Loops. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 18. , 2017, Campinas. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 220-231. DOI: https://doi.org/10.5753/wscad.2017.251.