Performance Improvements of Parallel Applications thanks to MPI-4.0 Hints
Resumo
HPC systems have experienced significant growth over the past years, with modern machines having hundreds of thousands of nodes. Message Passing Interface (MPI) is the de facto standard for distributed computing on these architectures. On the MPI critical path, the message-matching process is one of the most time-consuming operations. In this process, searching for a specific request in a message queue represents a significant part of the communication latency. So far, no miracle algorithm performs well in all cases. This paper explores potential matching specializations thanks to hints introduced in the latest MPI 4.0 standard. We propose a hash-table-based algorithm that performs constant time message-matching for no wildcard requests. This approach is suitable for intensive point-to-point communication phases in many applications (more than 50% of CORAL benchmarks). We demonstrate that our approach can improve the overall execution time of real HPC applications by up to 25%. Also, we analyze the limitations of our method and propose a strategy for identifying the most suitable algorithm for a given application. Indeed, we apply machine learning techniques for classifying applications depending on their message pattern characteristics.
Palavras-chave:
HPC, Distributed programming, MPI Matching, MPI 4.0 Sessions
Publicado
02/11/2022
Como Citar
MORARU, Maxim; ROUSSEL, Adrien; TABOADA, Hugo; JAILLET, Christophe; PÉRACHE, Marc; KRAJECKI, Michael.
Performance Improvements of Parallel Applications thanks to MPI-4.0 Hints. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 34. , 2022, Bordeaux/France.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 273-282.