Análise da Execução Concorrente de Aplicações Paralelas em Arquiteturas Multicore

  • Vinicius da Silva Unipampa
  • Thiarles Medeiros Unipampa
  • Hiago Rocha UFRGS
  • Marcelo Luizelli Unipampa
  • Fábio Rossi IFFarroupilha
  • Antonio Carlos Beck UFRGS
  • Arthur Lorenzon Unipampa

Abstract


Thread-level parallelism (TLP) has been widely used to optimize the use of computational resources (e.g., cache memories and functional units from CPU) of high-performance systems. However, as many applications do not scale as the number of threads increase, resources will be wasted when the application is executed with the ideal number of threads. Hence, the concurrent execution of parallel applications can be used to provide a better use of computational resources without impacting the performance and energy consumption of the system. Given that, we have carried out an extensive design space exploration with the execution of twenty-two parallel applications with different characteristics of shared memory accesses, IPC (instructions per cycle) and degree of TLP exploitation in two multicore architectures (Intel and AMD). We show which kind of applications can be concurrently executed and provide better use of computational resources. In the most significant case, the ideal combination of parallel applications running concurrently can optimize the trade-off between performance and energy consumption by up to 49% when compared to the individual execution of each application.

References

Bailey, D. H., Barszcz, E., Barton, J. T., Browning, D. S., Carter, R. L., Dagum, L., Fatoohi, R. A., Frederickson, P. O., Lasinski, T. A., Schreiber, R. S., Simon, H. D., Venkatakrishnan, V., and Weeratunga, S. K. (1991). The nas parallel benchmarks and summary and preliminary results. In ACM/IEEE SC, pages 158–165, USA. ACM.

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S.-H., and Skadron, K. (2009). Rodinia: A benchmark suite for heterogeneous computing. In IEEE Int. Symp. on Workload Characterization, pages 44–54, DC, USA. IEEE Computer Society.

Coskun, A., Strong, R., Tullsen, D., and Rosing, T. (2009). Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors. volume 37, pages 169–180.

Creech, T., Kotha, A., and Barua, R. (2013). Efcient multiprogramming for multicores with scaf. In 46th Annual IEEE/ACM Int. Symp. on Microarchitecture, MICRO-46, page 334–345, New York, NY, USA. ACM.

dos Santos Marques, W., de Souza, P. S. S., Lorenzon, A. F., Schneider Beck, A. C., Beck Rutzig, M., and Diniz Rossi, F. (2017). Improving edp in multi-core embedded systems through multidimensional frequency scaling. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4.

Gonzalez, R. and Horowitz, M. (1996). Energy dissipation in general purpose microprocessors. IEEE Journal of solid-state circuits, 31(9):1277–1284.

Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., and Nagel, W. E. (2013). Power measurement techniques on standard compute nodes: A quantitative comparison. In IEEE ISPASS, pages 194–204.

Hähnel, M., Döbel, B., Völp, M., and Härtig, H. (2012). Measuring energy consumption for short code paths using rapl. SIGMETRICS Performance Evaluation Rev., 40(3):13– 17.

Harris, T., Maas, M., and Marathe, V. J. (2014). Callisto: Co-scheduling parallel runtime In Proceedings of the Ninth European Conference on Computer Systems, systems. EuroSys '14, New York, NY, USA. Association for Computing Machinery.

Jorge González-Domínguez, Guillermo L. Taboada, B. B. F. M. J. M. and Touri˜no, J. (2012). Automatic mapping of parallel applications on multicore architectures using the servet benchmark suite. Computers and Eletrical Engineering, 38:258–269.

Lorenzon, A. F. and Beck, A. C. S. (2019). Parallel Computing Hits the Power Wall Principles, Challenges, and a Survey of Solutions. Springer Briefs in Computer Science. Springer.

Lorenzon, A. F., de Oliveira, C. C., Souza, J. D., and Beck, A. C. S. (2019). Aurora: Seamless optimization of openmp applications. IEEE TPDS, 30(5):1007–1021.

Lorenzon, A. F., Dellagostin Souza, J., and Schneider Beck, A. C. (2017). Laant: A library to automatically optimize edp for openmp applications. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017, pages 1229–1232.

Lorenzon, A. F., Sartor, A. L., Cera, M. C., and Beck, A. C. S. (2015). The inuence of parallel programming interfaces on multicore embedded systems. In IEEE COMPSAC, volume 2, pages 617–625. IEEE.

O'brien, K., Pietri, I., Reddy, R., Lastovetsky, A., and Sakellariou, R. (2017). A survey of power and energy predictive models in hpc systems and applications. ACM Computing Surveys (CSUR), 50(3):1–38.

Raasch, S. E. and Reinhardt, S. K. (2003). The impact of resource partitioning on smt processors. In PACT, pages 15–25.

Sudarsan, R. and Ribbens, C. J. (2016). Combining performance and priority for scheduling resizable parallel applications. Parrallel and Distribuited Computing, 87:55–66.

Suleman, M. A., Qureshi, M. K., and Patt, Y. N. (2008). Feedback-driven threading: Power-efcient and high-performance execution of multi-threaded workloads on cmps. SIGARCH Computer Architecture News, 36(1):277–286.

Tousimojarad, A. and Vanderbauwhede, W. (2014). An efcient thread mapping strategy for multiprogramming on manycore processors. CoRR, abs/1403.8020.

Varisteas, G. (2015). Effective cooperative scheduling of task-parallel applications on multiprogrammed parallel architectures. PhD thesis. QC 20151016.
Published
2020-10-21
DA SILVA, Vinicius; MEDEIROS, Thiarles; ROCHA, Hiago; LUIZELLI, Marcelo; ROSSI, Fábio; BECK, Antonio Carlos; LORENZON, Arthur. Análise da Execução Concorrente de Aplicações Paralelas em Arquiteturas Multicore. In: BRAZILIAN SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 21. , 2020, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 61-72. DOI: https://doi.org/10.5753/wscad.2020.14058.