Explorando Afinidade de Memória em Arquiteturas NUMA

Christiane Pousa Ribeiro; Vania Marangozova-Martin; Jean-Francois Méhaut; Fabrice Dupros; Alexandre Carissimi

doi:10.5753/wscad.2008.17670

Christiane Pousa Ribeiro Université Grenoble Alpes
Vania Marangozova-Martin Université Grenoble Alpes
Jean-Francois Méhaut Université Grenoble Alpes
Fabrice Dupros BRGM
Alexandre Carissimi UFRGS

DOI: https://doi.org/10.5753/wscad.2008.17670

Abstract

Arquiteturas NUMA possuem latência e largura de banda assimétricas devido a existência de múltiplos níveis hierárquicos de memória no sistema. Para garantir desempenho neste tipo de arquitetura torna-se necessário garantir a afinidade de memória nas aplicações. Os sistemas operacionais, com suporte para arquiteturas NUMA, possuem políticas para alocação e escalonamento de memória e threads que visam a afinidade de memória. Entretanto, essas políticas não apresentam sempre o melhor desempenho para todos os tipos de aplicações. Ferramentas e APIs, presentes nestes sistemas operacionais, permitem gerenciar explicitamente a afinidade de memória nas aplicações. Neste trabalho será apresentado a avaliação de desempenho de diferentes estratégias para gerenciamento explícito de afinidade de memória, implementadas com APIs do sistema operacional em aplicações paralelas. Essas estratégias foram implementadas em uma aplicação sísmica e em kernels do Benchmark NAS e executadas em diferentes arquiteturas NUMA. Os resultados mostram a importância de garantir a afinidade de memória em arquiteturas NUMA (ganho médio de até 80%) e que isso pode ser obtido através de APIs do sistema operacional.

References

Top500 supercomputing sites - http://www.top500.org, 2008.

T. Mu, J. Tao, M. Schulz, and S. A. Mckee. Interactive Locality Optimization on NUMA Architectures. In Software Visualization, 2003.

H. Löf and S. Holmgren. Affinity-on-next-touch: Increasing the performance of an industrial PDE solver on a cc-NUMA system. In ICS ’05: Proceedings of the 19th annual international conference on Supercomputing, pages 387–392, New York, NY, USA, 2005. ACM.

J. Marathe and F. Mueller. Hardware Profile-Guided Automatic Page Placement for ccNUMA Systems. In PPoPP ’06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 90–99, New York, NY, USA, 2006. ACM.

A. Carissimi, F. Dupros, J.-F. Mehaut, and R. V. Polanczyk. Aspectos de Programação Paralela em arquiteturas NUMA. In VIII Workshop em Sistemas Computacionais de Alto Desempenho, 2007.

F. Bellosa and M. Steckermeier. The Performance Implications of Locality Information Usage in Shared-Memory Multiprocessors. J. Parallel Distrib. Comput., 37(1):113–121, August 1996.

A. Joseph, J. Pete, and R. Alistair. Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport. High Performance Computing - HiPC 2006, pages 338–352, 2006.

J. Y. Haoqiang Jin, Michael Frumkin. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report 99-011/1999, NAS System Division - NASA Ames Research Center, 1999.

F. Dupros, H. Aochi, A. Ducellier, D. Komatitsch, and J. Roman. Exploiting intensive multithreading for the efficient simulation of seismic wave propagation. In 11th International Conference on Computational Science and Engineering, Sao Paulo, Brazil, July 2008.

The openmp specification for parallel programming - http://www.openmp.org, 2008.

F. Garcia and J. Fernandez. Posix thread libraries. Linux J., page 36, 2000.

C. Terboven, An, and S. Sarholz. Openmp on multicore architectures. In A Practical Programming Model for the Multi-Core Era, pages 54–64. Springer, 2008.

A. Kleen. A NUMA API for LINUX. Technical report, Novell, April 2005.

J. Corbalan, X. Martorell, and J. Labarta. Evaluation of the memory page migration influence in the system performance: the case of the SGI O2000. In ICS ’03: Proceedings of the 17th annual international conference on Supercomputing, pages 121–129, New York, NY, USA, 2003. ACM.

B. Verghese, S. Devine, A. Gupta, and M. Rosenblum. Operating system support for improving data locality on CCNUMA compute servers. In ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pages 279–289, New York, NY, USA, 1996. ACM.

J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris, C. A. Nelson, and C. D. Offner. Extending OpenMP for NUMA machines. In Supercomputing ’00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), 2000.

R. Love. Kernel korner: CPU affinity. Linux Journal, 2003(111):8, 2003.

L. T. Schermerhorn. Automatic Page Migration for Linux. Linux, 2007.

J. D. Mccalpin. STREAM: Sustainable memory bandwidth in high performance computers, 1995.