Proposta de implementação em Hardware de Rede Neural Profunda baseada em Stacked Sparse Autoencoder

Maria G. F. Coutinho; Marcelo A. C. Fernandes

Maria G. F. Coutinho UFRN
Marcelo A. C. Fernandes UFRN

Resumo

O objetivo deste trabalho consiste em propor a implementação em hardware de uma Rede Neural Profunda (Deep Neural Network - DNN) baseada na técnica Stacked Sparse Autoencoder (SSAE). O hardware proposto foi desenvolvido em Field Programmable Gate Array (FPGA) utilizando ponto fixo. A técnica de matriz sistólica (systolic array) foi adotada em todo o circuito com a finalidade de permitir a utilização de DNNs com muitas entradas, neurônios e camadas na FPGA. Todos os detalhes da arquitetura desenvolvida são apresentados, incluindo informações referentes a taxa de ocupação dos recursos de hardware e ao tempo de processamento para uma FPGA Virtex 6 xc6vlx240t-1ff1156. Os resultados indicam que a implementação foi capaz de atingir throughputs elevados, além de alcançar um speedup significativo em comparação com um trabalho do estado da arte, o que aponta a viabilidade da aplicação da proposta apresentada neste artigo em problemas de dados massivos.

Palavras-chave: Aprendizagem Profunda, Stacked Sparse Autoencoder, Hardware, FPGA, Matriz Sistólica

Referências

P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 37–49.

L. Deng, D. Yu et al., “Deep learning: methods and applications,” Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014.

J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, vol. 61, pp. 85–117, 2015.

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res., vol. 11, pp. 3371–3408, Dec. 2010.

J. Maria, J. Amaro, G. Falcao, and L. A. Alexandre, “Stacked autoencoders using low-power accelerated architectures for object recognition in autonomous systems,” Neural Process. Lett., vol. 43, no. 2, pp. 445–458, Apr. 2016.

A. C. D. de Souza and M. A. C. Fernandes, “Parallel fixed point implementation of a radial basis function network in an fpga,” Sensors, vol. 14, no. 10, pp. 18 223–18 243, 2014.

M. F. Torquato and M. A. Fernandes, “High-performance parallel implementation of genetic algorithm on fpga,” arXiv preprint arXiv:1806.11555, 2018.

M. Bettoni, G. Urgese, Y. Kobayashi, E. Macii, and A. Acquaviva, “A convolutional neural network fully implemented on fpga for embedded platforms,” in 2017 New Generation of CAS (NGCAS), Sept 2017, pp. 49–52.

Y. Ma, N. Suda, Y. Cao, S. Vrudhula, and J. sun Seo, “Alamo: Fpga acceleration of deep learning algorithms with a modularized rtl compiler,” Integration, vol. 62, pp. 14 – 23, 2018.

J. Jiang, R. Hu, D. Wang, J. Xu, and Y. Dou, “Performance of the fixed-point autoencoder,” vol. 23, pp. 77–82, 02 2016.

Y. Jin and D. Kim, “Unsupervised feature learning by pre-route simulation of auto-encoder behavior model,” International Journal of Computer, Electrical, Automation, Control and Information Engineering, vol. 8, no. 5, pp. 706 – 710, 2014.

A. Suzuki, T. Morie, and H. Tamukoh, “A shared synapse architecture for efficient fpga implementation of autoencoders,” PLOS ONE, vol. 13, no. 3, pp. 1–22, 03 2018.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT press, 2016.

H. T. Kung and C. E. Leiserson, Systolic arrays (for VLSI). Proceedings Symposium on Sparse Matrix Computations: I.S. Duff and C.G. Stewart. Eds., 1978.

Y. LeCun, C. Cortes, and C. J. Burges, “Yann LeCun’s Home Page,” http://yann.lecun.com/exdb/mnist/, Jan 2018.

The MathWorks, “Matlab/Simlink,” https://www.mathworks.com/, Jan 2018.