Evaluation of the Adequacy of the Charm++ Platform for Multicore Architectures with Hierarchical Memory

  • Laércio L. Pilla UFRGS / INRIA / Universitè de Grenoble
  • Christiane P. Ribeiro INRIA / Universitè de Grenoble
  • Philippe O. A. Navaux UFRGS
  • Jean-François Méhaut INRIA / Universitè de Grenoble

Abstract


Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high performance computing. On such NUMA nodes, the shared memory is physically distributed, making memory access costs vary depending on their distance. Therefore, there is a need to control the memory affinity to provide satisfactory performance. In this context, this paper presents an evaluation of the Charm++ runtime system on these architectures to verify its adequacy to the architecture on communications and load balancing. The results highlight the need of information about the memory hierarchy and machine topology to improve Charm++’s efficiency.

References

Awasthi, M., Nellans, D. W., Sudan, K., Balasubramonian, R., and Davis, A. (2010). Handling the problems and opportunities posed by multiple on-chip memory controllers. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010), pages 319–330, New York, NY, USA. ACM.

Bhatele, A., Kale, L. V., and Kumar, S. (2009). Dynamic topology aware load balancing algorithms for molecular dynamics applications. In Proceedings of the 23rd international Conference on Supercomputing (ICS 2009), pages 110–116, New York, NY, USA. ACM.

Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P. A., and Namyst, R. (2010). Structuring the execution of OpenMP applications for multicore architectures. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), pages 1–10. IEEE Computer Society.

Jeannot, E. and Mercier, G. (2010). Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures. In D’Ambra, P., Guarracino, M., and Talia, D., editors, Euro-Par 2010 - Parallel Processing, volume 6272 of Lecture Notes in Computer Science, pages 199–210. Springer Berlin / Heidelberg.

Joseph, A., Pete, J., and Alistair, R. (2006). Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport. In International Conference on High Performance Computing (HiPC 2006), pages 338–352.

Kale, L. V., Bohm, E., Mendes, C. L., Wilmarth, T., and Zheng, G. (2008). Programming Petascale Applications with Charm++ and AMPI. In Bader, D., editor, Petascale Computing: Algorithms and Applications, pages 421–441. Chapman & Hall / CRC Press.

Kale, L. V. and Krishnan, S. (1993). Charm++: A portable concurrent object oriented system based on C++. In Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 1993), pages 91–108. ACM.

Karypis, G. and Kumar, V. (1995). METIS: Unstructured graph partitioning and sparse matrix ordering system. The University of Minnesota, 2.

Lenoski, D., Laudon, J., Joe, T., Nakahira, D., Stevens, L., Gupta, A., and Hennessy, J. (1993). The dash prototype: Logic overhead and performance. IEEE Transactions on Parallel and Distributed Systems, 4(1):41–61.

Liu, M., Ji, W., Wang, Z., and Pu, X. (2009). A memory access scheduling method for multi-core processor. International Workshop on Computer Science and Engineering (WCSE 2009), 1:367–371.

Lu, K., Wang, R., and Lu, X. (2010). Brief announcement: Numa-aware transactional memory. PODC, pages 69–70.

Mccalpin, J. D. (1995). STREAM: Sustainable memory bandwidth in high performance computers. Technical report, University of Virginia.

Mei, C., Zheng, G., Gioachin, F., and Kale, L. V. (2010). Optimizing a parallel runtime system for multicore clusters: a case study. In Proceedings of the 2010 TeraGrid Conference (TG 2010), New York, NY, USA. ACM.

Mercier, G. and Clet-Ortega, J. (2009). Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments. In Ropo, M., Westerholm, J., and Dongarra, J., editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, volume 5759 of Lecture Notes in Computer Science, pages 104–115. Springer Berlin / Heidelberg.

OpenMP (2010). The OpenMP API Specification for Parallel Programming.

Pellegrini, F. and Roman, J. (1996). Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In International Conference on High-Performance Computing and Networking (HPCN 1996), pages 493–498. Springer.

Ribeiro, C. P., Dupros, F., Carissimi, A., Marangozova-Martin, V., Méhaut, J.-F., and de Aguiar, M. S. (2008). Explorando Afinidade de Memória em Arquiteturas NUMA. In WSCAD ’08: Proceedings of the 9th Workshop em Sistemas Computacionais de Alto Desempenho - SBAC-PAD, Campo Grande, Brazil. SBC.

Ribeiro, C. P., Mehaut, J.-F., Carissimi, A., Castro, M., and Fernandes, L. G. (2009). Memory Affinity for Hierarchical Shared Memory Multiprocessors. In 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2009), pages 59–66.

Wulf, W. and McKee, S. A. (1995). Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23:20–24.
Published
2011-07-19
PILLA, Laércio L.; RIBEIRO, Christiane P.; NAVAUX, Philippe O. A.; MÉHAUT, Jean-François. Evaluation of the Adequacy of the Charm++ Platform for Multicore Architectures with Hierarchical Memory. In: WORKSHOP ON PERFORMANCE OF COMPUTER AND COMMUNICATION SYSTEMS (WPERFORMANCE), 10. , 2011, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2011 . p. 2088-2099. ISSN 2595-6167.