Análise da Execução Concorrente de Aplicações Paralelas em Arquiteturas Multicore

Vinicius da Silva; Thiarles Medeiros; Hiago Rocha; Marcelo Luizelli; Fábio Rossi; Antonio Carlos Beck; Arthur Lorenzon

doi:10.5753/wscad.2020.14058

Vinicius da Silva Unipampa
Thiarles Medeiros Unipampa
Hiago Rocha UFRGS
Marcelo Luizelli Unipampa
Fábio Rossi IFFarroupilha
Antonio Carlos Beck UFRGS
Arthur Lorenzon Unipampa

DOI: https://doi.org/10.5753/wscad.2020.14058

Resumo

O paralelismo no nível de threads (TLP) tem sido amplamente utilizado para otimizar o uso de recursos computacionais (e.g., memórias cache e unidades funcionais da CPU) de sistemas de alto desempenho. No entanto, como algumas aplicações não escalam com o número de threads, recursos ﬁcarão ociosos quando a aplicação é executada com o número ideal de threads. Neste sentido, a execução concorrente de aplicações paralelas pode ser utilizada para prover uma melhor utilização dos recursos computacionais sem impactar no desempenho e consumo de energia do sistema como um todo. Dito isto, nós realizamos uma extensa exploração de espaço e projeto com a execução de vinte e duas aplicações paralelas com diferentes características de acesso à memória compartilhada, IPC (instruções por ciclo) e grau de exploração do TLP em duas arquiteturas multicore (Intel e AMD). Nós mostramos quais tipos de aplicações podem ser executadas de maneira concorrente e ainda proporcionar melhor utilização dos recursos computacionais. No caso mais signiﬁcativo, a combinação ideal de aplicações paralelas executando de maneira concorrente pode otimizar o custo-benefício entre desempenho e consumo de energia em até 49% quando comparado à execução individual de cada aplicação.

Referências

Bailey, D. H., Barszcz, E., Barton, J. T., Browning, D. S., Carter, R. L., Dagum, L., Fatoohi, R. A., Frederickson, P. O., Lasinski, T. A., Schreiber, R. S., Simon, H. D., Venkatakrishnan, V., and Weeratunga, S. K. (1991). The nas parallel benchmarks and summary and preliminary results. In ACM/IEEE SC, pages 158–165, USA. ACM.

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S.-H., and Skadron, K. (2009). Rodinia: A benchmark suite for heterogeneous computing. In IEEE Int. Symp. on Workload Characterization, pages 44–54, DC, USA. IEEE Computer Society.

Coskun, A., Strong, R., Tullsen, D., and Rosing, T. (2009). Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors. volume 37, pages 169–180.

Creech, T., Kotha, A., and Barua, R. (2013). Efcient multiprogramming for multicores with scaf. In 46th Annual IEEE/ACM Int. Symp. on Microarchitecture, MICRO-46, page 334–345, New York, NY, USA. ACM.

dos Santos Marques, W., de Souza, P. S. S., Lorenzon, A. F., Schneider Beck, A. C., Beck Rutzig, M., and Diniz Rossi, F. (2017). Improving edp in multi-core embedded systems through multidimensional frequency scaling. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4.

Gonzalez, R. and Horowitz, M. (1996). Energy dissipation in general purpose microprocessors. IEEE Journal of solid-state circuits, 31(9):1277–1284.

Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., and Nagel, W. E. (2013). Power measurement techniques on standard compute nodes: A quantitative comparison. In IEEE ISPASS, pages 194–204.

Hähnel, M., Döbel, B., Völp, M., and Härtig, H. (2012). Measuring energy consumption for short code paths using rapl. SIGMETRICS Performance Evaluation Rev., 40(3):13– 17.

Harris, T., Maas, M., and Marathe, V. J. (2014). Callisto: Co-scheduling parallel runtime In Proceedings of the Ninth European Conference on Computer Systems, systems. EuroSys '14, New York, NY, USA. Association for Computing Machinery.

Jorge González-Domínguez, Guillermo L. Taboada, B. B. F. M. J. M. and Touri˜no, J. (2012). Automatic mapping of parallel applications on multicore architectures using the servet benchmark suite. Computers and Eletrical Engineering, 38:258–269.

Lorenzon, A. F. and Beck, A. C. S. (2019). Parallel Computing Hits the Power Wall Principles, Challenges, and a Survey of Solutions. Springer Briefs in Computer Science. Springer.

Lorenzon, A. F., de Oliveira, C. C., Souza, J. D., and Beck, A. C. S. (2019). Aurora: Seamless optimization of openmp applications. IEEE TPDS, 30(5):1007–1021.

Lorenzon, A. F., Dellagostin Souza, J., and Schneider Beck, A. C. (2017). Laant: A library to automatically optimize edp for openmp applications. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017, pages 1229–1232.

Lorenzon, A. F., Sartor, A. L., Cera, M. C., and Beck, A. C. S. (2015). The inuence of parallel programming interfaces on multicore embedded systems. In IEEE COMPSAC, volume 2, pages 617–625. IEEE.

O'brien, K., Pietri, I., Reddy, R., Lastovetsky, A., and Sakellariou, R. (2017). A survey of power and energy predictive models in hpc systems and applications. ACM Computing Surveys (CSUR), 50(3):1–38.

Raasch, S. E. and Reinhardt, S. K. (2003). The impact of resource partitioning on smt processors. In PACT, pages 15–25.

Sudarsan, R. and Ribbens, C. J. (2016). Combining performance and priority for scheduling resizable parallel applications. Parrallel and Distribuited Computing, 87:55–66.

Suleman, M. A., Qureshi, M. K., and Patt, Y. N. (2008). Feedback-driven threading: Power-efcient and high-performance execution of multi-threaded workloads on cmps. SIGARCH Computer Architecture News, 36(1):277–286.

Tousimojarad, A. and Vanderbauwhede, W. (2014). An efcient thread mapping strategy for multiprogramming on manycore processors. CoRR, abs/1403.8020.

Varisteas, G. (2015). Effective cooperative scheduling of task-parallel applications on multiprogrammed parallel architectures. PhD thesis. QC 20151016.