Efficient Scan Operator Methods on a GPU

  • Adrián P. Diéguez University of A Coruña
  • Margarita Amor University of A Coruña
  • Ramón Doallo University of A Coruña

Resumo


Current GPUs (Graphics Processing Units) offer high computational power at relatively low cost, nonetheless, this enhanced performance often comes at the expenses of flexibility and code complexity. Efficient GPU programming requires detailed knowledge on certain hardware aspects. The scan operator is an important building block for a wide range of algorithms. In this paper, we present a number of parallel scan methods based on the traditional cyclic reduction tridiagonal solver and the Ladner-Fischer parallel prefix adder. Futhermore, we analyze a set of new features introduced in the Kepler Nvidia architecture such as read-only data cache and shuffle instructions. Our methods provide an excellent performance in many cases, up to 48% improvement over the CUDA Data Parallel Primitives (CUDPP) library.
Palavras-chave: Instruction sets, Proposals, Graphics processing units, Kernel, Complexity theory, Arrays, Registers
Publicado
22/10/2014
DIÉGUEZ, Adrián P.; AMOR, Margarita; DOALLO, Ramón. Efficient Scan Operator Methods on a GPU. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 26. , 2014, Paris/FR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2014 . p. 190-197.