Avaliação da Adequação da Plataforma Charm++ para Arquiteturas Multicore com Memória Hierárquica

Laércio L. Pilla; Christiane P. Ribeiro; Philippe O. A. Navaux; Jean-François Méhaut

Laércio L. Pilla UFRGS / INRIA / Universitè de Grenoble
Christiane P. Ribeiro INRIA / Universitè de Grenoble
Philippe O. A. Navaux UFRGS
Jean-François Méhaut INRIA / Universitè de Grenoble

Resumo

Máquinas multicore com organização de memória NUMA servem atualmente como base arquitetural para o processamento de alto desempenho. Nestes nós NUMA, a memória compartilhada é fisicamente distribuída, de forma que o custo de acesso à memória pode variar conforme sua distância. Assim, há a necessidade do controle de afinidade de memória para garantir desempenhos satisfatórios. Nesse contexto, este artigo apresenta uma avaliação da plataforma Charm++ nestas arquiteturas para verificar a sua adequação à arquitetura nos quesitos de comunicação e balanceamento de carga. Os resultados ressaltam a necessidade de informações da hierarquia de memória e topologia da máquina para o aumento da eficiência da plataforma Charm++.

Referências

Awasthi, M., Nellans, D. W., Sudan, K., Balasubramonian, R., and Davis, A. (2010). Handling the problems and opportunities posed by multiple on-chip memory controllers. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010), pages 319–330, New York, NY, USA. ACM.

Bhatele, A., Kale, L. V., and Kumar, S. (2009). Dynamic topology aware load balancing algorithms for molecular dynamics applications. In Proceedings of the 23rd international Conference on Supercomputing (ICS 2009), pages 110–116, New York, NY, USA. ACM.

Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P. A., and Namyst, R. (2010). Structuring the execution of OpenMP applications for multicore architectures. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), pages 1–10. IEEE Computer Society.

Jeannot, E. and Mercier, G. (2010). Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures. In D’Ambra, P., Guarracino, M., and Talia, D., editors, Euro-Par 2010 - Parallel Processing, volume 6272 of Lecture Notes in Computer Science, pages 199–210. Springer Berlin / Heidelberg.

Joseph, A., Pete, J., and Alistair, R. (2006). Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport. In International Conference on High Performance Computing (HiPC 2006), pages 338–352.

Kale, L. V., Bohm, E., Mendes, C. L., Wilmarth, T., and Zheng, G. (2008). Programming Petascale Applications with Charm++ and AMPI. In Bader, D., editor, Petascale Computing: Algorithms and Applications, pages 421–441. Chapman & Hall / CRC Press.

Kale, L. V. and Krishnan, S. (1993). Charm++: A portable concurrent object oriented system based on C++. In Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 1993), pages 91–108. ACM.

Karypis, G. and Kumar, V. (1995). METIS: Unstructured graph partitioning and sparse matrix ordering system. The University of Minnesota, 2.

Lenoski, D., Laudon, J., Joe, T., Nakahira, D., Stevens, L., Gupta, A., and Hennessy, J. (1993). The dash prototype: Logic overhead and performance. IEEE Transactions on Parallel and Distributed Systems, 4(1):41–61.

Liu, M., Ji, W., Wang, Z., and Pu, X. (2009). A memory access scheduling method for multi-core processor. International Workshop on Computer Science and Engineering (WCSE 2009), 1:367–371.

Lu, K., Wang, R., and Lu, X. (2010). Brief announcement: Numa-aware transactional memory. PODC, pages 69–70.

Mccalpin, J. D. (1995). STREAM: Sustainable memory bandwidth in high performance computers. Technical report, University of Virginia.

Mei, C., Zheng, G., Gioachin, F., and Kale, L. V. (2010). Optimizing a parallel runtime system for multicore clusters: a case study. In Proceedings of the 2010 TeraGrid Conference (TG 2010), New York, NY, USA. ACM.

Mercier, G. and Clet-Ortega, J. (2009). Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments. In Ropo, M., Westerholm, J., and Dongarra, J., editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, volume 5759 of Lecture Notes in Computer Science, pages 104–115. Springer Berlin / Heidelberg.

OpenMP (2010). The OpenMP API Specification for Parallel Programming.

Pellegrini, F. and Roman, J. (1996). Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In International Conference on High-Performance Computing and Networking (HPCN 1996), pages 493–498. Springer.

Ribeiro, C. P., Dupros, F., Carissimi, A., Marangozova-Martin, V., Méhaut, J.-F., and de Aguiar, M. S. (2008). Explorando Afinidade de Memória em Arquiteturas NUMA. In WSCAD ’08: Proceedings of the 9th Workshop em Sistemas Computacionais de Alto Desempenho - SBAC-PAD, Campo Grande, Brazil. SBC.

Ribeiro, C. P., Mehaut, J.-F., Carissimi, A., Castro, M., and Fernandes, L. G. (2009). Memory Affinity for Hierarchical Shared Memory Multiprocessors. In 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2009), pages 59–66.

Wulf, W. and McKee, S. A. (1995). Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23:20–24.