Unidade de Processamento Neural Extensível a Partir de um Dispositivo Lógico Programável

Thiago Cruz; Jemerson Damasio; Danilo Santos; Danyllo Albuquerque; Mirko Perkusich; Hyggo Almeida

doi:10.5753/ise.2022.227056

Thiago Cruz UFCG
Jemerson Damasio UFCG
Danilo Santos UFCG
Danyllo Albuquerque UFCG
Mirko Perkusich UFCG
Hyggo Almeida UFCG

DOI: https://doi.org/10.5753/ise.2022.227056

Resumo

O aprendizado profundo (do inglês, Deep Learning) representa uma técnica poderosa para resolver problemas complexos de aprendizado. Com o crescimento dos dispositivos integrados combinado com uma demanda por baixa latência e melhoramento continuo, os modelos treinados precisam cada vez mais ser executados de forma eficiente. A fim de atender essas demandadas, bem como manter o baixo custo de energia, neste artigo é apresentada a experiência do desenvolvimento de uma Unidade de Processamento Neural baseado em uma arquitetura de acelerador escalável para redes de aprendizado profundo em larga escala usando o Field-Programmable Gate Array (FPGA) como o protótipo de hardware.

Palavras-chave: Inteligência Artificial, Aprendizado Profundo, FPGA, Desempenho, Acelerador de Hardware

Referências

J. Hauswald, Y. Kang, M. A. Laurenzano, Q. Chen, C. Li, T. Mudge, R. G. Dreslinski, J. Mars, and L. Tang. Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pages 27–40. IEEE, 2015.

S. K. Kim, L. C. McAfee, P. L. McMahon, and K. Olukotun. A highly scalable restricted boltzmann machine fpga implementation. In 2009 International Conference on Field Programmable Logic and Applications, pages 367–372. IEEE, 2009.

C. U. Kumar and B. J. Rabi. Design and implementation of modified russian peasant multiplier using msqrtcsla based fir filter. Indian Journal of Science and Technology, 9(7):1–6, 2016.

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436–444, 2015.

D. L. Ly and P. Chow. A high-performance fpga architecture for restricted boltzmann machines. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, pages 73–82, 2009.

J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 26–35, 2016.

C.Wang, L. Gong, Q. Yu, X. Li, Y. Xie, and X. Zhou. Dlau: A scalable deep learning accelerator unit on fpga. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(3):513–517, 2016.

Q. Yu, C. Wang, X. Ma, X. Li, and X. Zhou. A deep learning prediction process accelerator based fpga. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 1159–1162. IEEE, 2015.

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pages 161–170, 2015.