Aumentando a Eficiência na Execução de Algoritmos de Grafos em HPC
Abstract
The growing need for extracting information from massive data - structured as graphs - has been pushing the development of even more robust parallel algorithms for such processing. However, the algorithms' communication-bound behavior and the highly irregular structure of the currently used graphs are hurdles to achieving the same levels of performance and efficiency as observed for other parallel applications. In this paper, we show that the scalability of different graph applications varies according to the used algorithms and their databases and, in most cases, using all available cores (i.e., all available processor's cores for execution) is not the best option in terms of efficiency. Based on that, we propose MultGraph, a framework that enables the simultaneous processing of several algorithms/graphs, distributing them in a non-uniform way among the cores, instead of executing them serially (i.e., one after another) with the maximum available parallelism. MultGraph works in two steps: (i) characterizing the algorithms/graphs by their efficiency levels; (ii) defining the allocations (clusters of algorithms to be executed concurrently), number of threads for each of them, and clusters' execution order. Experimental results on three multicore processors (Intel and AMD) show that MultGraph improves in up to 9.21x and on average 4.52x the algorithm's execution time compared to the default execution of applications in HPC systems.
References
Beamer, S., Asanovic, K., and Patterson, D. (2015b). Locality exists in graph processing: Workload characterization on an ivy bridge server. In ISWC, pages 56–65. IEEE.
da Silva, V., Medeiros, T., Rocha, H., Luizelli, M., Rossi, F., Beck, A. C., and Lorenzon, A. (2020). Análise da execução concorrente de aplicações paralelas em arquiteturas multicore. In WSCAD, pages 61–72. SBC.
da Silva, V. S., Nogueira, A. G., de Lima, E. C., de A. Rocha, H. M., Serpa, M. S., Luizelli, M. C., Rossi, F. D., Navaux, P. O., Beck, A. C. S., and Francisco Lorenzon, A. (2021). Smart resource allocation of concurrent execution of parallel applications. CCPE, page e6600.
Eager, D., Zahorjan, J., and Lazowska, E. (1989). Speedup versus efficiency in parallel systems. IEEE TC, 38(3):408–423.
Gleich, D. F. (2014). Pagerank beyond the web.
Heidari, S., Simmhan, Y., Calheiros, R., and Buyya, R. (2018). Scalable graph processing frameworks: A taxonomy and open challenges. CSUR, 51:1–53.
Jorge González-Domínguez, Guillermo L. Taboada, B. B. F. M. J. M. and Touriño, J. (2012). Automatic mapping of parallel applications on multicore architectures using the servet benchmark suite. CEE, 38:258–269.
Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.
Lorenzon, A. F. and Beck Filho, A. C. S. (2019). Parallel computing hits the power wall: principles, challenges, and a survey of solutions. Springer Nature.
Mofrad, M. H., Melhem, R., Ahmad, Y., and Hammoud, M. (2020). Graphite: a numaaware hpc system for graph analytics based on a new mpi* x parallelism model. Proceedings of the VLDB Endowment, 13(6):783–797.
Raasch, S. E. and Reinhardt, S. K. (2003). The impact of resource partitioning on smt processors. In PACT, pages 15–25.
Rocha, H. M. G. d. A., Schwarzrock, J., Lorenzon, A. F., and Beck, A. C. S. (2021). In Boosting graph analytics by tuning threads and data affinity on numa systems. PDP, pages 161–168. IEEE.
Roy, A., Mihailovic, I., and Zwaenepoel, W. (2013). X-stream: Edge-centric graph processing using streaming partitions. page 472–488.
Shun, J. and Blelloch, G. E. (2013). Ligra: a lightweight graph processing framework for shared memory. In ACM PPoPP, pages 135–146.
Sudarsan, R. and Ribbens, C. J. (2016). Combining performance and priority for scheduling resizable parallel applications. JPDC, 87:55–66.
Suleman, M. A., Qureshi, M. K., and Patt, Y. N. (2008). Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on cmps. SIGARCH CAN, 36(1):277–286.
Varisteas, G. (2015). Effective cooperative scheduling of task-parallel applications on multiprogrammed parallel architectures. PhD thesis. QC 20151016.
Yan, D., Bu, Y., Tian, Y., and Deshpande, A. (2017). Big graph analytics platforms. FTD, 7(1-2):1–195.
Zhang, K., Chen, R., and Chen, H. (2015). Numa-aware graph-structured analytics. SIGPLAN Not., 50(8):183–193.
