Optimizing a 3D-FWT Code in a Heterogeneous Cluster of Multicore CPUs and Manycore GPUs

  • Gregorio Bernabé University of Murcia
  • Javier Cuenca University of Murcia
  • Domingo Giménez University of Murcia

Abstract


Clusters of nodes composed of many core GPUs and multicore CPUs are used to solve scientific problems with high computational requirements. The development and optimization of parallel-heterogeneous codes for these systems is a complex task which requires a deep knowledge of the different components of the hybrid, heterogeneous and hierarchical computational system, and also of the scientific problem to be solved and the different programing paradigms to be used for its efficient solution. Techniques for efficient development and optimization of scientific codes for these systems are needed. This paper presents an analysis of the development and optimization of the 3D-Fast Wavelet Transform (3D-FWT) for a heterogeneous cluster of multicores+GPUs. Different parallel programming paradigms (message passing, shared memory and SIMD GPU) are combined to fully exploit the computing capacity of the different computational elements of the cluster, so resulting in an efficient combination of basic codes developed previously for individual components (individual nodes, multicore or GPU) and an important reduction of the compression time of long video sequences.
Keywords: Multicore processing, Graphics processing units, Kernel, Image resolution, Optimization, autotuning engine, 3D-FWT, cluster, manycore GPUs, multicore CPUs
Published
2013-10-23
BERNABÉ, Gregorio; CUENCA, Javier; GIMÉNEZ, Domingo. Optimizing a 3D-FWT Code in a Heterogeneous Cluster of Multicore CPUs and Manycore GPUs. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 25. , 2013, Porto de Galinhas/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2013 . p. 97-104.