HPCGRA - An Orthogonal Designed CGRA Generator for High Performance Spatial Accelerators
ResumoRecently, the increasing adoption of domain-speciﬁc architectures to execute kernels with high computing density and the exploration of sparse architectures using Systolic Arrays created the ideal scenario for using Coarsegrained reconﬁgurable architectures (CGRAs) to accelerate applications. Unlike Systolic Array, CGRA can run different kernel sets and keep a good balance between energy consumption and performance. In this work, we present the HPCGRA, an orthogonal designed CGRA generator for high-performance spatial accelerators. Our tool does not require any expertise in Verilog design. In our approach, the CGRA is designed and implemented in an orthogonal fashion, through wrapping the main building blocks: functional units, interconnection patterns, routing, and elastic buffer capabilities, conﬁguration words, and memories. It optimizes and simpliﬁes the process of creating CGRAs architectures using a portable description (JSON ﬁle) and generating a generic, scalable, and efﬁcient Verilog RTL code with Veriloggen. The tool automatically generates CGRA with up to 46x66 functional units, reaching 1.2 Tera ops/s.
Bachrach, J., Vo, H., Richards, B., Lee, Y., Waterman, A., Avizienis, R., Wawrzynek, J., and Asanoviíc, K. (2012). Chisel: constructing hardware in a scala embedded language. In DAC Design Automation Conference 2012, pages 1212–1221. IEEE.
Chin, S. A., Niu, K. P., Walker, M., Yin, S., Mertens, A., Lee, J., and Anderson, J. H. (2018). Architecture exploration of standard-cell and fpga-overlay cgras using the open-source cgra-me framework. In Int Symposium on Physical Design.
Chin, S. A., Sakamoto, N., Rui, A., Zhao, J., Kim, J. H., Hara-Azumi, Y., and Anderson, J. (2017). Cgra-me: A unied framework for cgra modelling and exploration. In Int Conf on Application-specic Systems, Architectures and Processors (ASAP).
Ferreira, R., Duarte, V., Meireles, W., Pereira, M., Carro, L., and Wong, S. (2013). A just-in-time modulo scheduling for virtual coarse-grained recongurable architectures. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).
Genc, H. (2020). A dsl for systolic arrays. https://github.com/hngenc/systolic-array. Acessado em: 2020-08-11.
Jia, L., Lu, L., Wei, X., and Liang, Y. (2020). Generating systolic array accelerators with reusable blocks. IEEE Micro, 40(4):85–92.
JSON (2020). Introducing json. https://www.json.org/json-en.html. Acessado em: 2020-07-25.
Nowatzki, T., Ardalani, N., Sankaralingam, K., and Weng, J. (2018). Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, pages 1–15.
Silva, L., Almeida, D., Nacif, J., Sánchez-Osorio, I., Hernández-Martínez, C. A., and Ferreira, R. (2017). Exploring the dynamics of large-scale gene regulatory networks using hardware acceleration on a heterogeneous cpu-fpga platform. In International Conference on ReConFigurable Computing and FPGAs (ReConFig).
Takamaeda-Yamazaki, S. (2015). Pyverilog: A python-based hardware design processIn International Symposium on Applied Recongurable ing toolkit for verilog hdl. Computing, pages 451–460. Springer.
Taras, I. and Anderson, J. H. (2019). Impact of fpga architecture on area and performance of cgra overlays. In 2019 IEEE 27th Annual International Symposium on FieldProgrammable Custom Computing Machines (FCCM), pages 87–95. IEEE.
Weng, J., Liu, S., Dadu, V., Wang, Z., Shah, P., and Nowatzki, T. (2020). Dsagen: Synthesizing programmable spatial accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 268–281. IEEE.
Zhang, J., Zhang, W., Luo, G., Wei, X., Liang, Y., and Cong, J. (2019). Frequency improvement of systolic array-based cnns on fpgas. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4. IEEE.