Scaling and Optimizing the Gysela Code on a Cluster of Many-Core Processors

Guillaume Latu; Yuuichi Asahi; Julien Bigot; Tamas Feher; Virginie Grandgirard

Guillaume Latu CEA, IRFM
Yuuichi Asahi QST Rokkasho Fusion Institute
Julien Bigot CEA, Maison de la Simulation
Tamas Feher Max Planck Institute for Plasma Physics
Virginie Grandgirard CEA, IRFM

Resumo

The current generation of the Xeon Phi Knights Landing (KNL) processor provides a highly multi-threaded environment on which regular programming models such as MPIjopenMP can be used. Many factors impact the performance achieved by applications on these devices: one of the key points is the efficient exploitation of SIMD vector units, and one another is the memory access pattern. Works have been conducted to adapt a plasma turbulence application, namely Gysela, for this architecture. A set of different techniques have been used: standard vectorization techniques, auto-tuning of one computation kernel, switching to high-order scheme. As a result, KNL execution times have been reduced by up to a factor 3. This effort has also permitted to gain a speedup of 2x on Broadwell architecture and 3x on Skylake. Nice scalability curves up to a few thousands cores have been obtained on a strong scaling experiment. Incremental work meant a large payoff without resorting to using low-level intrinsics.

Palavras-chave: Instruction sets, Computer architecture, Plasmas, Registers, Optimization, Hardware, many-core, SIMD, vectorization