Uma abordagem de alto desempenho para multiplicação de matrizes densas em sistemas reconfiguráveis

Viviane Lucy S. Souza; Victor W. C. de Medeiros; Derci de O. Lima; Manoel E. de Lima

doi:10.5753/wscad.2009.17395

Viviane Lucy S. Souza UFPE
Victor W. C. de Medeiros UFPE
Derci de O. Lima UFPE
Manoel E. de Lima UFPE

DOI: https://doi.org/10.5753/wscad.2009.17395

Resumo

A demanda por máquinas de alto desempenho e por novas estratégias que buscam melhorar o processamento de dados em aplicações de computação científica tem crescido muito nos últimos anos. Algumas novas arquiteturas baseadas em GPU, processadores Cell e FPGA ou ainda plataformas híbridas aparecem como soluções para esses problemas. Neste trabalho nós apresentamos uma arquitetura de alto desempenho para implementação de multiplicação de matrizes densas em uma plataforma comercial híbrida, o RASC (Reconfigurable Application-Specific Computing). O RASC foi desenvolvido pela Silicon Graphics e consiste em uma plataforma composta por um processador de propósito geral acoplado a co-processadores baseados em FPGA. A arquitetura proposta investiga como a solução do problema de multiplicação de matrizes pode tirar proveito das características de uma plataforma com alto grau de paralelismo. Nós também investigamos a escalabilidade do algoritmo e os mecanismos de reuso de dados. Baseado nessas investigações um estudo de caso é sugerido e discutido me detalhes.

Referências

Laurenz Christian Buri, Studies of Classical HPC Problems on fine-grained and massively parallel computing enviromnment based on reconfigurable hardware, Msc. Thesis, Department of Microelectronics and Information Technology IMIT KTH, 2006.

Ronald Scrofano, Jr. Accelerating Scientific Computing Applications with reconfigurable hardware, Ph.D.Thesis, Faculty of the Graduate School University of Southern California, 2006.

Aiichiro Nakano. Class notes for CSCI 599: High performance scientific computing University of Southern California, Fall semester, 2003.

D.C. Rapaport. The Art of Molecular Dynamics Simulation. Cambridge University Press, Cambridge, 2004.

Maya B. Gokhale and Paul S. Graham. Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays. Springer, Dordrecht, The Netherlands, 2005.

Ling Zhuo, Viktor K. Prasanna, Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems, IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 18, No. 4, pp. 433-448, April 2007.

Ling Zhuo , Viktor K. Prasanna, Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems, Proceedings of the 12th International Conference on Parallel and Distributed Systems, p.87-95, July 12-15, 2006.

L. Zhuo and V. K. Prasanna. Design Tradeoffs for BLAS Operations on Reconfigurable Hardware. In Proc. 34th Int’l Conf. Parallel Processing (ICPP’05), Oslo, Norway, June 2005.

SRC Computers, Inc., http://www.srccomp.com/. Accesed in: March/2009.

SGI RASC, http://www.sgi.com/products/rasc/. Accessed in: March/2009.

Ling Zhuo , Viktor K. Prasanna, Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems, Proceedings of the 12th International Conference on Parallel and Distributed Systems, p.87-95, July 12-15, 2006.

L. Zhuo and V.K. Prasanna, "High-Performance Linear Algebra Operations on Reconfigurable Systems," Proc. Supercomputing 2005, IEEE CS Press, 2005, p. 2.

R. Scrofano and V. K. Prasanna. Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware. In Proc. Int’l Conf. Eng. of Reconfigurable Systems and Algorithms (ERSA’04), pages 284–290, June 2004

R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Ed.. SIAM, 1994.

R. Scrofano and V. K. Prasanna. Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware. In Proc. Int’l Conf. Eng. of Reconfigurable Systems and Algorithms (ERSA’04), pages 284–290, June 2004.

NUMALink. http://www.nasi.com/sgi_NUMAlink.php. Accessed in: March/2009.

Laurenz Christian Buri, Studies of Cassicals HPC Problems on fine-grained and massively parallel computing enviromnment based on reconfigurable hardware, Msc. Thesis, Department of Microelectronics and Information Technology IMIT KTH, 2006.

Barros, A. C., Medeiros, V. W., Souza, V. L., Nascimento, P. S., Mazer, Â., Barbosa, J. P., Neves, B. P., Santos, I., and de Lima, M. E. 2008. Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA. In Proceedings of the 21st Annual Symposium on integrated Circuits and System Design (Gramado, Brazil, September 01 - 04, 2008). SBCCI '08. ACM, New York, NY, 40-45.

SSP Stub Users Guide [link]. Accessed in: March/2009. SSP Stub Users Guide [link]. Accessed in: March/2009.

Campbell, S. J. and Khatri, S. P. 2006. Resource and delay efficient matrix multiplication using newer FPGA devices. In Proceedings of the 16th ACM Great Lakes Symposium on VLSI (Philadelphia, PA, USA, April 30 - May 01, 2006).GLSVLSI '06. ACM, New York, NY, 308-311.

Zhuo, L. 2007. Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems. IEEE Trans. Parallel Distrib. Syst. 18, 4 (Apr. 2007), 433-448.

Dou, Y., Vassiliadis, S., Kuzmanov, G. K., and Gaydadjiev, G. N. 2005. 64-bit floating-point FPGA matrix multiplication. In Proceedings of the 2005 ACM/SIGDA 13th international Symposium on Field- Programmable Gate Arrays (Monterey, California, USA, February 20 - 22, 2005). FPGA '05. ACM, New York, NY, 86-95.