Scalability and Efficiency Analysis of LU Factorization using CPU vs GPU

  • Estevan Braz Brandt Costa UEL
  • Fabio Takeshi Matsunaga UEL
  • Jacques Duílio Brancher UEL

Abstract


With the GPU (Graphics Processing Unit) advent, and its use in mathematical processes and applications, by emergence of GPGPU (General-Purpose Graphics Processing Unit), many platforms have arisen to allow developers to use this architecture in their favor. Despite the development of algorithms based on GPU programming have been simpler and faster, there is still many things to do and think to develop an algorithm which make use of such technology. The aim of this work is to show the study of the main characteristics that must be taken into account when developing an algorithm to run on the GPU device. A case study was analyzed through the LU factorization algorithm and results showed a gain of approximately 93% in performance, considering all optimizations implemented. The main factors that contributed to the performance improvement were the memory management and types of processes and data that are executed and transferred in kernels.

References

Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Langou, J., Ltaief, H., and Tomov, S. (2011). LU factorization for accelerator-based systems. 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pages 217–224.

Alonso, P., Dolz, M. F., Igual, F. D., Mayo, R., and Quintana-Orti, E. S. (2012). Saving Energy in the LU Factorization with Partial Pivoting on Multi-core Processors. 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pages 353–358.

Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., and Quintana-OrtÃ, E. S. (2008). Solving dense linear systems on graphics processors. Euro-Par 08: Proceedings of the 14th international Euro-Par conference on Parallel Processing.

Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Saengpatsa, N., Tomov, S., and Dongarra, J. (2010). A unified HPC environment for hybrid manycore/GPU distributed systems. LAPACK Working Note, Tech. Rep. 234.

Cupertino, L. F., Singulani, A. P., Silva, C. P., Aur, M., Pacheco, C., Janeiro, R. D., and Farias, R. (2010). LU Decomposition on GPUs : The Impact of Memory Access. Work.

Dobes, J., Cerny, D., and Biolek, D. (2011). Efficient procedure for solving circuit algebraic-differential equations with modified sparse LU factorization improving fill-in suppression. 2011 20th European Conference on Circuit Theory and Design (ECCTD), (2):689–692.

Du, P., Luszczek, P., Tomov, S., and Dongarra, J. (2013). Soft error resilient QR factorization for hybrid system with GPGPU . Journal of Computational Science, (0):–.

Fogue, M., Igual, F. D., Quintana-ortÃ, E. S., and Geijn, R. V. D. (2010). Retargeting PLAPACK to clusters with hardware accelerators flame working note 42.

Galoppo, N. (2005). LU-GPU : Efficient Algorithms for Solving Dense Linear Systems on Graphics. Architecture, (c).

Hu, L., Che, X., and Xie, Z. (2013). GPGPU cloud: A paradigm for general purpose computing. Tsinghua Science and Technology, 18(1).

Humprey, J. R., Price, D. K., Spagnoli, K. E., Polini, A. L., and Kelmelis, E. J. (2010). CULA: hybrid GPU accelerated linear algebra routines. Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series.

Ino, F., Matsui, M., Goda, K., and Hagihara, K. (2005). Performance Study of LU Decomposition on the Programmable GPU. 12th IEEE Intl Conf. High Performance Computing (HiPC05), (16016254).

Matsumoto, K., Nakasato, N., Sakai, T., Yahagi, H., and Sedukhin, S. G. (2011). Multilevel Optimization of Matrix Multiplication for GPU-equipped Systems. Procedia Computer Science, 4:342–351.

Michailidis, P. D. and Margaritis, K. G. (2011). Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMP. Journal of Computational and Applied Mathematics, 236(3):326–341.

Nakasato, N. (2012). Implementation of a parallel tree method on a GPU. Journal of Computational Science, 3(3):132 – 141.

Rodriguez-Alvarez, M.-J., Sanchez, F., Soriano, A., and Iborra, A. (2010). Sparse Givens resolution of large system of linear equations: Applications to image reconstruction . Mathematical and Computer Modelling, 52(7-8):1258–1264.
Published
2013-07-23
COSTA, Estevan Braz Brandt; MATSUNAGA, Fabio Takeshi; BRANCHER, Jacques Duílio. Scalability and Efficiency Analysis of LU Factorization using CPU vs GPU. In: WORKSHOP ON PERFORMANCE OF COMPUTER AND COMMUNICATION SYSTEMS (WPERFORMANCE), 12. , 2013, Maceió/AL. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2013 . p. 918-928. ISSN 2595-6167.