Análise e Paralelização de um algoritmo para Gridding

Carlos A. T. Aguni; Daniel Cordeiro

doi:10.5753/wperformance.2018.3330

Carlos A. T. Aguni USP
Daniel Cordeiro USP

DOI: https://doi.org/10.5753/wperformance.2018.3330

Resumo

Este artigo analisa o desempenho do algoritmo de Gridding implementado de forma paralelizada em duas propostas: com uma placa aceleradora Intel R© Xeon Phi (arquitetura Many Cores) e uma Nvidia Tesla K20x Unidades de Processamento Gráfico de Propósito Geral (General Purpose Graphics Processing Unit GPGPU). Estudamos sua adequabilidade quando otimizado para um ambiente multi/many core e o perfilamos em relação ao consumo de recursos como memória, processador, e uso de caches na CPU e placas aceleradoras.

Referências

A. Eklund, P. Dufort, D. F. and LaConte, S. (2013). Medical image processing on the GPU: Past, present and future. Med. Image Anal., vol. 17 pp. 1073-1094.

Beatty, Philip J., D. G. N. and Pauly, J. M. (2005). Rapid gridding reconstruction with a minimal oversampling ratio. Medical Imaging, IEEE Transactions on 24.6, 799-808.

Braam, P. and Wortmann, P. (2016). Kernel Prototyping SOW. Technical Report SKA-TEL-SDP-0000083, Science Data Processor Consortium.

Cornwell, T. J., G. K. . B. S. (2004). W projection: A new algorithm for wide field imaging with radio synthesis arrays. Astronomical Data Analysis Software and Systems XIV ASP Conference Series, v. 347.

Cornwell, T. (2006). Computing costs of imaging for the xNTD. Technical Report ASKAP Memo Series 001. ANTF.

Dewdney, P. E. and et. al (2010). SKA phase 1: Preliminary system description. SKA memo. 130.

Dewdney, P. E., Hall, P. J., Schilizzi, R. T., and Lazio, T. J. L. W. (2009). The square kilometre array. Proceedings of the IEEE, 97(8):1482–1496.

Fang, J., Varbanescu, A. L., Sips, H., Zhang, L., Che, Y., and Xu, C. (2013). An empirical study of Intel Xeon Phi. arXiv preprint arXiv:1310.5842.

Frigo, M. and Johnson, S. G. (2005). The design and implementation of FFTW3. Proceedings of the IEEE, 93: 216–231.

Huang, Q., Huang, Z., Werstein, P., and Purvis, M. K. (2008). GPU as a general purpose computing resource. In PDCAT.

Humphreys, B. and Cornwell, T. (2011). Analysis of convolutional resampling algorithm performance. Memo, 132.

Humphreys et al., B. (2013). SKA GitHub. [link]. Acesso em 17/08/2017.

Jacobs, D. (2005). Correlation and convolution. Class Notes for CMSC 426.

Muscat, D. (2014). High-performance image synthesis for radio interferometry.

O’Sullivan, J. D. (1985). A fast sinc function gridding algorithm for fourier inversion in computer tomography. IEEE Trans. Med. Imag., vol. 4 pp. 200207.

SDP (2018). SKA Website Science Data Processor. [link]. Acesso em 23/03/2018.

Sørensen, T. S., Schaeffter, T., Noe, K. Ø., and Hansen, M. S. (2008). Accelerating the nonequispaced fast fourier transform on commodity graphics hardware. IEEE Transactions on Medical Imaging, 27(4):538–547.

T.J. Cornwell, M.A. Voronkov, B. H. (2012). Correlation and convolution. Wide field imaging for the Square Kilometre Array.