Plain: Ferramenta para Desenvolvimento de Aceleradores para Overlays em FPGA na Nuvem em Tempo de Execução
Resumo
Os FPGAs oferecem eficiência energética para o desenvolvimento de aceleradores para fluxo de dados na Nuvem. Porém, existem vários desafios para popularizar seu uso. Dentre eles, podemos citar o tempo de compilação (que pode demorar horas) e conhecimento de hardware para uso adequado de linguagens de síntese de alto nível. Recentemente, a ferramenta READY possibilitou a redução do tempo de compilação e configuração para microsegundos. O ambiente foi validado na plataforma em nuvem HARP 2 da Intel/Altera. Apesar da integração com a Linguagem C++ para o desenvolvimento das aplicações, o acelerador é descrito de forma textual como um grafo. Neste trabalho é apresentado a extensão PLAIN, que inclui uma interface online gráfica para descrição dos aceleradores, a automatização do fluxo de projeto, dois níveis de simulação e um nível de execução. A ferramenta também mostra estatísticas de desempenho e permite criação de novos operadores para exploração do espaço de projeto.Referências
Chin, S. A., Niu, K. P., Walker, M., Yin, S., Mertens, A., Lee, J., and Anderson, J. H. (2018). Architecture exploration of standard-cell and FPGA-Overlay CGRAs using the open-source CGRA-ME framework. In Int. Symposium on Physical Design.
Dave, S. and Shrivastava, A. (2017). CCF: A CGRA compilation framework. https: //github.com/MPSLab-ASU/ccf. Acessado em: 2020-08-11.
Ferreira, R., Cardoso, J. M., and Neto, H. C. (2004). An environment for exploring datadriven architectures. In Int. C. Field Programmable Logic and Applications (FPL).
Ferreira, R., Vendramini, J., and Nacif, M. (2011). Dynamic recongurable multicast interconnections by using radix-4 multistage networks in fpga. In IEEE International Conference on Industrial Informatics.
Franz, M., Lopes, C. T., Huck, G., Sumer, O., and Bader, G. D. (2016). Cytoscape. js: a graph theory library for visualisation and analysis. Bioinformatics, 32(2).
Intel (2020). Intel Xeon with integrated FPGA systems at PC2. https://wikis.uni-paderborn.de/pc2doc/HARP2. Acessado em: 2020-08-11.
JSON (2020). Introducing json. https://www.json.org/json-en.html. Acessado em: 2020-07-25.
Krommydas, K., Sasanka, R., and Feng, W.-c. Bridging the FPGA programmability-portability gap via automatic opencl code generation and tuning. In Int Conf on Application-specic Systems, Architectures and Processors (ASAP). (2016).
Luebbers, E., Liu, S., and Chu, M. (2020). Simplify software integration for fpga accelerators with opae.
Mutigwe, C. and Aghdasi, F. (2013). Instruction set usage analysis for application-specic systems design. Int'l Journal of Information Technology and Computer Science, 7(2).
Nane, R., Sima, V.-M., Pilato, C., Choi, J., Fort, B., Canis, A., Chen, Y. T., Hsiao, H., Brown, S., Ferrandi, F., et al. (2015). A survey and evaluation of fpga high-level synthesis tools. IEEE Trans. on CAD of Integrated Circuits and Systems.
Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008). Scalable parallel programming with CUDA. Queue, 6(2):40–53.
Nowatzki, T., Gangadhar, V., Ardalani, N., and Sankaralingam, K. (2017). Streamdataow acceleration. In Int. Symposium on Computer Architecture (ISCA).
Penha, J., Silva, L., Silva, J., Coelho, K., Baranda, H., Nacif, J., and Ferreira, R. (2019). ADD: Accelerator design and deploy-a tool for FPGA high-performance dataow computing. Concurrency and Computation: Practice and Experience, 31(18).
Silva, L. B. D., Ferreira, R., Canesche, M., Menezes, M. M., Vieira, M. D., Penha, J., Jamieson, P., and Nacif, J. A. M. (2019). READY: A ne-grained multithreading overlay framework for modern CPU-FPGA dataow applications. ACM Transactions on Embedded Computing Systems (TECS), 18(5s):1–20.
Stanojeviíc, I., Kovaceviíc, M., and Senk, V. (2019). Application of maxeler dataow supercomputing to spherical code design. In Exploring the DataFlow Supercomputing Paradigm, pages 133–168. Springer.
Wijtvliet, M., Waeijen, L., and Corporaal, H. (2016). Coarse grained recongurable architectures in the past 25 years: Overview and classication. In Int. Conf. on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).
Dave, S. and Shrivastava, A. (2017). CCF: A CGRA compilation framework. https: //github.com/MPSLab-ASU/ccf. Acessado em: 2020-08-11.
Ferreira, R., Cardoso, J. M., and Neto, H. C. (2004). An environment for exploring datadriven architectures. In Int. C. Field Programmable Logic and Applications (FPL).
Ferreira, R., Vendramini, J., and Nacif, M. (2011). Dynamic recongurable multicast interconnections by using radix-4 multistage networks in fpga. In IEEE International Conference on Industrial Informatics.
Franz, M., Lopes, C. T., Huck, G., Sumer, O., and Bader, G. D. (2016). Cytoscape. js: a graph theory library for visualisation and analysis. Bioinformatics, 32(2).
Intel (2020). Intel Xeon with integrated FPGA systems at PC2. https://wikis.uni-paderborn.de/pc2doc/HARP2. Acessado em: 2020-08-11.
JSON (2020). Introducing json. https://www.json.org/json-en.html. Acessado em: 2020-07-25.
Krommydas, K., Sasanka, R., and Feng, W.-c. Bridging the FPGA programmability-portability gap via automatic opencl code generation and tuning. In Int Conf on Application-specic Systems, Architectures and Processors (ASAP). (2016).
Luebbers, E., Liu, S., and Chu, M. (2020). Simplify software integration for fpga accelerators with opae.
Mutigwe, C. and Aghdasi, F. (2013). Instruction set usage analysis for application-specic systems design. Int'l Journal of Information Technology and Computer Science, 7(2).
Nane, R., Sima, V.-M., Pilato, C., Choi, J., Fort, B., Canis, A., Chen, Y. T., Hsiao, H., Brown, S., Ferrandi, F., et al. (2015). A survey and evaluation of fpga high-level synthesis tools. IEEE Trans. on CAD of Integrated Circuits and Systems.
Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008). Scalable parallel programming with CUDA. Queue, 6(2):40–53.
Nowatzki, T., Gangadhar, V., Ardalani, N., and Sankaralingam, K. (2017). Streamdataow acceleration. In Int. Symposium on Computer Architecture (ISCA).
Penha, J., Silva, L., Silva, J., Coelho, K., Baranda, H., Nacif, J., and Ferreira, R. (2019). ADD: Accelerator design and deploy-a tool for FPGA high-performance dataow computing. Concurrency and Computation: Practice and Experience, 31(18).
Silva, L. B. D., Ferreira, R., Canesche, M., Menezes, M. M., Vieira, M. D., Penha, J., Jamieson, P., and Nacif, J. A. M. (2019). READY: A ne-grained multithreading overlay framework for modern CPU-FPGA dataow applications. ACM Transactions on Embedded Computing Systems (TECS), 18(5s):1–20.
Stanojeviíc, I., Kovaceviíc, M., and Senk, V. (2019). Application of maxeler dataow supercomputing to spherical code design. In Exploring the DataFlow Supercomputing Paradigm, pages 133–168. Springer.
Wijtvliet, M., Waeijen, L., and Corporaal, H. (2016). Coarse grained recongurable architectures in the past 25 years: Overview and classication. In Int. Conf. on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).
Publicado
21/10/2020
Como Citar
PASSE, Fernando; BRAGANÇA, Lucas; CANESCHE, Michael; CATHOUD, Felippe; NACIF, José; FERREIRA, Ricardo.
Plain: Ferramenta para Desenvolvimento de Aceleradores para Overlays em FPGA na Nuvem em Tempo de Execução. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 21. , 2020, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 13-24.
DOI: https://doi.org/10.5753/wscad.2020.14054.