Scaling and Optimizing the Gysela Code on a Cluster of Many-Core Processors
Resumo
The current generation of the Xeon Phi Knights Landing (KNL) processor provides a highly multi-threaded environment on which regular programming models such as MPIjopenMP can be used. Many factors impact the performance achieved by applications on these devices: one of the key points is the efficient exploitation of SIMD vector units, and one another is the memory access pattern. Works have been conducted to adapt a plasma turbulence application, namely Gysela, for this architecture. A set of different techniques have been used: standard vectorization techniques, auto-tuning of one computation kernel, switching to high-order scheme. As a result, KNL execution times have been reduced by up to a factor 3. This effort has also permitted to gain a speedup of 2x on Broadwell architecture and 3x on Skylake. Nice scalability curves up to a few thousands cores have been obtained on a strong scaling experiment. Incremental work meant a large payoff without resorting to using low-level intrinsics.
Palavras-chave:
Instruction sets, Computer architecture, Plasmas, Registers, Optimization, Hardware, many-core, SIMD, vectorization
Publicado
24/09/2018
Como Citar
LATU, Guillaume; ASAHI, Yuuichi; BIGOT, Julien; FEHER, Tamas; GRANDGIRARD, Virginie.
Scaling and Optimizing the Gysela Code on a Cluster of Many-Core Processors. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 30. , 2018, Lyon/FR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2018
.
p. 466-473.
