MPI Broadcast com Compressão de Números de Ponto Flutuante
Resumo
Aplicações cientificas desenvolvidas para sistemas paralelos de memória distribuída consomem parte do seu tempo total de execução trocando dados entre processos. Portanto, aprimorar o desempenho das rotinas responsáveis pela comunicação vem ganhando cada vez mais importância. Neste contexto, este trabalho investiga a utilização de um algoritmo de compressão de ponto-flutuante na transmissão de mensagens longas. Este algoritmo foi implementado na primitiva broadcast do MPI e foram efetuadas medições de desempenho para diferentes tipos de mensagens em até 512 núcleos de processamento. Os resultados obtidos demonstram que a compressão pode acelerar significativamente o broadcast padrão do MPI.
Referências
MPI Forum. http://www.mpi-forum.org/.
M. Burtscher & P. Ratanaworabhan. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Transactions on Computers, 58(1):18-31. January 2009.
T. Hoefler, C. Siebert & W. Rehm. A practical constant-time MPI broadcast algorithm for largescale InfiniBand Cluster with Multicast. In IEEE International Parallel & Distributed Processing Symposium, 2007.
Michael Brim & Joel Sommers. Efficient MPI broadcast communication and file distribution for local-area clusters. Project Report for CS 739: Distributed System, the University of Wisconsin, Madison.
V. S. Kumar, R. Nanjundiah, M. J. Thazhuthaveetil & R. Govindarajan. Impact of message compression on the scalability an atmospheric modeling application on clusters. Parallel Computing, 38:1-16, 2008.
P. Lindstrom & M. Isenburg. Fast and efficient compression of floating-point data. IEEE transactions on visualization and computer graphics, 12,(5):1245-1250, 2006.
J. Ke, M. Burtscher & E. Speight. Runtime compression of MPI messages to improve the performance and scalability of parallel applications. In SC '04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, 2004.
B. Goeman, H. Vandierendonck & K. Bosschere. Differential FCM: increasing value prediction accuracy by improving table usage efficiency. In Seventh International Symposium on High Performance Computer Architecture, pp. 207-216, 2001.
Y. Sazeides & J. E. Smith. The predictability of data values. In 30th International Symposium on Microarchitecture, pp. 248-258, 1997.
A. Faraj, X. Yuan & D. Lowenthal. STAR-MPI: Self tuned adaptive routines for MPI collective operations. In Proceedings of the 20th Annual International Conference on Supercomputing. pp. 199 – 208, 2006.
R. Rabenseifner. Automatic MPI Counter Profiling. In 42nd CUG Conference, 2000.
J. Liu, A. Mamindala & D. Panda. Fast and scalable MPI-Level broadcast using InfiniBand’s hardware multicast support, In Parallel and Distributed Processing Symposium, 2004.
R. Rabenseifner, Optimization of Collective Reduction Operations, In International Conference on Computational Science, 2004.
G. Almási, P. Heidelberger, C. J. Archer, J. Charles, X. Martorell, C. C. Erway, J. E. Moreira, B. Steinmacher-Burow & Y. Zheng, Optimization of MPI collective communication on BlueGene/L systems. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pp 253-262, 2005.