AtTune: A Heuristic based Framework for Parallel Applications Autotuning
Resumo
Several aspects limit the scalability of parallel applications, e.g., off-chip bus saturation and data synchronization. Moreover, the high cost of cooling HPC systems, which can outweigh the cost of developing the system itself, has pushed the parallel application’s execution to another level of requirements, in terms of performance and energy. In this work, we propose AtTune: a heuristic-based framework for tuning the number of processes/threads and CPU frequency to optimize the parallel applications’ execution. AtTune is transparent for the user, independent of the input size, and it optimizes for different parallel programming models. We evaluated our proposed solution considering five well-known kernels implemented in MPI and OpenMP. Experimental results on two real multi-core systems showed that AtTune improves up to 36%, 11%, and 32% the energy efficiency, performance, and Energy-Delay Product, respectively.
Referências
P.-F. Dutot, Y. Georgiou, D. Glesser, L. Lefevre, M. Poquet, and I. Rais, “Towards energy budget control in hpc,” in CCGRID. IEEE, 2017, pp. 381–390.
A. F. Lorenzon, C. C. De Oliveira, J. D. Souza, and A. C. S. Beck, “Aurora: Seamless optimization of openmp applications,” TPDS, vol. 30, no. 5, pp. 1007–1021, 2018.
A. F. Lorenzon and A. C. S. Beck Filho, Parallel Computing Hits the Power Wall: Principles, Challenges, and a Survey of Solutions. Springer Nature, 2019.
C. C. De Oliveira, A. F. Lorenzon, and A. C. S. Beck, “Automatic tuning tlp and dvfs for edp with a non-intrusive genetic algorithm framework,” in SBESC. IEEE, 2018, pp. 46–153.
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber et al., “The nas parallel benchmarks,” IJSA, vol. 5, no. 3, pp. 63–73, 1991.
E. L. Padoin, M. Diener, P. O. Navaux, and J.-F. Ḿehaut, “Managing power demand and load imbalance to save energy on systems with heterogeneous cpu speeds,” in SBAC-PAD. IEEE, 2019, pp. 72–79.
J. Li and J. F. Martinez, “Dynamic power-performance adaptation of parallel computation on chip multiprocessors,” in HPCA, 2006. IEEE, 2006, pp. 77–87.
D. Li, B. R. de Supinski, M. Schulz, K. Cameron, and D. S. Nikolopoulos, “Hybrid mpi/openmp power-aware computing,” in IPDPS. IEEE, 2010, pp. 1–12.
G. Chadha, S. Mahlke, and S. Narayanasamy, “When less is more (limo): controlled parallelism for improved efficiency,” in CASES, 2012, pp. 141–150.
F. Alessi, P. Thoman, G. Georgakoudis, T. Fahringer, and D. S. Nikolopoulos, “Application-level energy awareness for openmp,” in IWOMP. Springer, 2015, pp. 219–232.
A. Marathe, P. E. Bailey, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski, “A run-time system for power-constrained hpc applications,” in HiPC. Springer, 2015, pp. 394–408.
S. K. Gutierrez, N. T. Hjelm, M. G. Venkata, and R. L. Graham, “Performance evaluation of open mpi on cray xe/xk systems,” in HOTI. IEEE, 2012, pp. 40–47.