Otimizando a correspondência de patches para o inpainting de imagens com diferentes interfaces de programação paralela
Resumo
Diversos problemas da área de processamento de imagens demandam um alto esforço computacional, como, por exemplo, os métodos de inpainting baseados na replicação de patches. Estes métodos viabilizam a solução de problemas reais, como a reconstrução de regiões sem conteúdo em imagens. Portanto, eles podem se beneficiar da exploração do paralelismo no nível de threads (TLP) através de interfaces de programação paralela (IPPs). No entanto, como cada IPP possui diferentes características com respeito ao gerenciamento de threads, escolher a ideal para implementar uma aplicação é importante para obter o melhor custo-benefício entre desempenho e consumo de energia, representado pelo energy-delay product (EDP). Considerando o exposto, neste trabalho, nós analisamos o potencial de exploração de paralelismo de um algoritmo de inpainting amplamente difundido na literatura com diferentes IPPs (PThreads, OpenMP, OmpSs-2 e OpenACC) e mostramos qual IPP proporciona o melhor desempenho, consumo de energia e EDP para três arquiteturas multicore e duas GPUs. Através de um conjunto de experimentos, os resultados mostram que OpenMP explorando TLP com laços paralelos é melhor para processadores AMD, enquanto que o OmpSs-2 apresenta melhores resultados nos processadores Intel.Referências
AlZu'bi, S., Shehab, M., Al-Ayyoub, M., Jararweh, Y., and Gupta, B. (2020). Parallel implementation for 3d medical volume fuzzy segmentation. Pattern Recognit. Lett., 130:312–318.
Balladini, J., Suppi, R., Rexachs, D., and Luque, E. (2011). Impact of parallel programming models and cpus clock frequency on energy consumption of hpc systems. In IEEE/ACS AICCSA, pages 16–21.
Barcelona Supercomputing Center (2020). OmpSs-2 Spec. Acesso em: 01/08/2020.
Butenhof, D. R. (1997). Programming with POSIX Threads. Addison-Wesley, USA.
Butko, A., Bruguier, F., Gamatié, A., and Sassatelli, G. (2017). Efcient programming for multicore processor heterogeneity: Openmp versus ompss. In OpenSuCo.
Castello, A., Mayo, R., Seo, S., Balaji, P., Quintana-Ortí, E. S., and Pe˜na, A. J. (2020). Analysis of threading libraries for high performance computing. IEEE Trans. Comp.
Criminisi, A., Perez, P., and Toyama, K. (2004). Region lling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process., 13:1200–1212.
dos Santos Marques, W., de Souza, P. S. S., Lorenzon, A. F., Schneider Beck, A. C., Beck Rutzig, M., and Diniz Rossi, F. (2017). Improving edp in multi-core embedded systems through multidimensional frequency scaling. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4.
Fang, Z., Yang, D., Zhang, W., Chen, H., and Zang, B. (2011). A comprehensive analysis and parallelization of an image retrieval algorithm. In ISPASS, pages 154–164. IEEE.
Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., and Nagel, W. E. (2013). Power measurement techniques on standard compute nodes: A quantitative comparison. In ISPASS, pages 194–204. IEEE.
Hähnel, M., Döbel, B., Völp, M., and Härtig, H. (2012). Measuring energy consumption for short code paths using rapl. SIGMETRICS Perform. Eval., 40:13–17.
Kim, C. G., Kim, J. G., et al. (2014). Optimizing image processing on multi-core cpus with intel parallel programming technologies. Multimedia tools and Appl., 68:237– 251.
Lorenzon, A. F. and Beck, A. C. S. (2019). Parallel Computing Hits the Power Wall Principles, Challenges, and a Survey of Solutions. Springer Briefs in Computer Science. Springer.
Lorenzon, A. F., Dellagostin Souza, J., and Schneider Beck, A. C. (2017). Laant: A library to automatically optimize edp for openmp applications. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017, pages 1229–1232.
Lorenzon, A. F., Sartor, A. L., Cera, M. C., and Beck, A. C. S. (2015). The inuence of parallel programming interfaces on multicore embedded systems. In IEEE COMPSAC, volume 2, pages 617–625. IEEE.
Lorenzon, A. F., Sartor, A. L., Cera, M. C., and Schneider Beck, A. C. (2015). Optimized use of parallel programming interfaces in multithreaded embedded architectures. In 2015 IEEE Computer Society Annual Symposium on VLSI, pages 410–415.
OpenACC Working Group and others (2011). The openacc application programming interface. Retrieved March, 26:2019.
OpenMP Arch. Review Board (2018). OpenMP API. V.5.0. Acesso em: 01/08/2020.
Park, I. K., Singhal, N., Lee, M. H., Cho, S., and Kim, C. (2010). Design and performance evaluation of image processing algorithms on gpus. IEEE Trans. Parallel Distrib. Syst., 22(1):91–104.
Portereld, A. K., Olivier, S. L., Bhalachandra, S., and Prins, J. F. (2013). Power measurement and concurrency throttling for energy reduction in openmp programs. In IPDPS, Workshops and Phd Forum, pages 884–891. IEEE.
Pratama, Y. and Ratno, P. P. (2020). . Indonesian J. of Comput. and Cybern. Syst., 14(3).
Slaight, T. (2002). Platform management ipmi controllers, sensors, and tools. In IDF.
Zhang, N. (2009). Computing parallel speeded-up robust features (p-surf) via posix threads. In ICIC, pages 287–296. Springer.
Balladini, J., Suppi, R., Rexachs, D., and Luque, E. (2011). Impact of parallel programming models and cpus clock frequency on energy consumption of hpc systems. In IEEE/ACS AICCSA, pages 16–21.
Barcelona Supercomputing Center (2020). OmpSs-2 Spec. Acesso em: 01/08/2020.
Butenhof, D. R. (1997). Programming with POSIX Threads. Addison-Wesley, USA.
Butko, A., Bruguier, F., Gamatié, A., and Sassatelli, G. (2017). Efcient programming for multicore processor heterogeneity: Openmp versus ompss. In OpenSuCo.
Castello, A., Mayo, R., Seo, S., Balaji, P., Quintana-Ortí, E. S., and Pe˜na, A. J. (2020). Analysis of threading libraries for high performance computing. IEEE Trans. Comp.
Criminisi, A., Perez, P., and Toyama, K. (2004). Region lling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process., 13:1200–1212.
dos Santos Marques, W., de Souza, P. S. S., Lorenzon, A. F., Schneider Beck, A. C., Beck Rutzig, M., and Diniz Rossi, F. (2017). Improving edp in multi-core embedded systems through multidimensional frequency scaling. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4.
Fang, Z., Yang, D., Zhang, W., Chen, H., and Zang, B. (2011). A comprehensive analysis and parallelization of an image retrieval algorithm. In ISPASS, pages 154–164. IEEE.
Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., and Nagel, W. E. (2013). Power measurement techniques on standard compute nodes: A quantitative comparison. In ISPASS, pages 194–204. IEEE.
Hähnel, M., Döbel, B., Völp, M., and Härtig, H. (2012). Measuring energy consumption for short code paths using rapl. SIGMETRICS Perform. Eval., 40:13–17.
Kim, C. G., Kim, J. G., et al. (2014). Optimizing image processing on multi-core cpus with intel parallel programming technologies. Multimedia tools and Appl., 68:237– 251.
Lorenzon, A. F. and Beck, A. C. S. (2019). Parallel Computing Hits the Power Wall Principles, Challenges, and a Survey of Solutions. Springer Briefs in Computer Science. Springer.
Lorenzon, A. F., Dellagostin Souza, J., and Schneider Beck, A. C. (2017). Laant: A library to automatically optimize edp for openmp applications. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017, pages 1229–1232.
Lorenzon, A. F., Sartor, A. L., Cera, M. C., and Beck, A. C. S. (2015). The inuence of parallel programming interfaces on multicore embedded systems. In IEEE COMPSAC, volume 2, pages 617–625. IEEE.
Lorenzon, A. F., Sartor, A. L., Cera, M. C., and Schneider Beck, A. C. (2015). Optimized use of parallel programming interfaces in multithreaded embedded architectures. In 2015 IEEE Computer Society Annual Symposium on VLSI, pages 410–415.
OpenACC Working Group and others (2011). The openacc application programming interface. Retrieved March, 26:2019.
OpenMP Arch. Review Board (2018). OpenMP API. V.5.0. Acesso em: 01/08/2020.
Park, I. K., Singhal, N., Lee, M. H., Cho, S., and Kim, C. (2010). Design and performance evaluation of image processing algorithms on gpus. IEEE Trans. Parallel Distrib. Syst., 22(1):91–104.
Portereld, A. K., Olivier, S. L., Bhalachandra, S., and Prins, J. F. (2013). Power measurement and concurrency throttling for energy reduction in openmp programs. In IPDPS, Workshops and Phd Forum, pages 884–891. IEEE.
Pratama, Y. and Ratno, P. P. (2020). . Indonesian J. of Comput. and Cybern. Syst., 14(3).
Slaight, T. (2002). Platform management ipmi controllers, sensors, and tools. In IDF.
Zhang, N. (2009). Computing parallel speeded-up robust features (p-surf) via posix threads. In ICIC, pages 287–296. Springer.
Publicado
21/10/2020
Como Citar
PEREIRA, Luan et al.
Otimizando a correspondência de patches para o inpainting de imagens com diferentes interfaces de programação paralela. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 21. , 2020, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 73-84.
DOI: https://doi.org/10.5753/wscad.2020.14059.