Scaling and Optimizing the Gysela Code on a Cluster of Many-Core Processors

  • Guillaume Latu CEA, IRFM
  • Yuuichi Asahi QST Rokkasho Fusion Institute
  • Julien Bigot CEA, Maison de la Simulation
  • Tamas Feher Max Planck Institute for Plasma Physics
  • Virginie Grandgirard CEA, IRFM

Resumo


The current generation of the Xeon Phi Knights Landing (KNL) processor provides a highly multi-threaded environment on which regular programming models such as MPIjopenMP can be used. Many factors impact the performance achieved by applications on these devices: one of the key points is the efficient exploitation of SIMD vector units, and one another is the memory access pattern. Works have been conducted to adapt a plasma turbulence application, namely Gysela, for this architecture. A set of different techniques have been used: standard vectorization techniques, auto-tuning of one computation kernel, switching to high-order scheme. As a result, KNL execution times have been reduced by up to a factor 3. This effort has also permitted to gain a speedup of 2x on Broadwell architecture and 3x on Skylake. Nice scalability curves up to a few thousands cores have been obtained on a strong scaling experiment. Incremental work meant a large payoff without resorting to using low-level intrinsics.
Palavras-chave: Instruction sets, Computer architecture, Plasmas, Registers, Optimization, Hardware, many-core, SIMD, vectorization
Publicado
24/09/2018
LATU, Guillaume; ASAHI, Yuuichi; BIGOT, Julien; FEHER, Tamas; GRANDGIRARD, Virginie. Scaling and Optimizing the Gysela Code on a Cluster of Many-Core Processors. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 30. , 2018, Lyon/FR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 466-473.