Generating Optimized Multicore Accelerator Architectures

  • Alba Sandyra Bezerra Lopes IFRN
  • Antonio Carlos Beck UFRGS
  • Marcelo Brandalero UFRGS
  • Monica Pereira UFRN

Resumo


Designing multicores to achieve a balance between high performance, area and energy efficiency is still a challenge given the large diversity of embedded applications. In this scenario, combining different hardware processing elements at design time to balance the aforementioned constraints is crucial to provide an efficient design. Reconfigurable architectures (RAs) are flexible platforms being able to save energy and improve performance due to their reconfiguration and parallelism exploitation capability. But a key question is whether the combination of processors and RAs can provide performance and energy improvements as expected at the price of extra area and power. In this work, we propose a new methodology to generate different heterogeneous multicore configurations that comprise GPPs and reconfigurable architectures to optimize a given nonfunctional requirement (e.g. performance and energy) under certain constraints. We combined superscalar processors with coarse grained reconfigurable architectures, and we generated optimized multicores considering three scenarios. The first one is a combination of cores and RAs that achieves the highest performance possible for a set of benchmarks. The second one is a combination of cores and RAs limited by a performance threshold and the third one is limited by an energy budget. Our experiments show that the optimized multicore achieved a speedup of more than 2.5x for certain applications. With a relinquish of 10% of speedup, it was possible to save more than 11% in energy and with an energy saving budget of 20% one can save more than 30% in area.

Palavras-chave: Multiprocessor/Multicore/Manycore Systems

Referências

01

A. Butko F. Bruguier A. Gamatie et al. "Full-system simulation of big.little multicore architecture for performance and energy exploration" 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) pp. 201-208 Sep. 2016.

J. Cong M. A. Ghodrat M. Gill et al. "Accelerator-rich architectures: Opportunities and progresses" Proceedings of the 51st Annual Design Automation Conference ser. DAC ‘14 pp. 180:1-180:6 2014.

C. Gao A. Gutierrez M. Rajan et al. "A study of mobile device utilization" 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 225-234 Mar. 2015.

K. Neshatpour H. M. Mokrani A. Sasan et al. "Architectural considerations for fpga acceleration of machine learning applications in mapreduce" Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures Modeling and Simulation ser. SAMOS ‘18 pp. 89-96 2018.

K. Van Craeynest L. Eeckhout "Understanding fundamental design choices in single-isa heterogeneous multicore architectures" ACM Trans. Archit. Code Optim. vol. 9 no. 4 pp. 32:1-32:23 Jan. 2013.

Tegra mobile processors [online] Available: http://www.nvidia.com/.

The samsung reference platform [online] Available: http://www.samsung.com/.

Technologies - dynamiq [online] Available: https://www.arm.com/.

G. J. Smit P. J. Havinga L. T. Smit et al. "Dynamic reconfiguration in mobile systems" International Conference on Field Programmable Logic and Applications pp. 171-181 2002.

W. Li X. Zeng Z. Dai et al. "A high energy-efficient reconfigurable vliw symmetric cryptographic processor with loop buffer structure and chain processing mechanism" Chinese Journal of Electronics vol. 26 no. 6 pp. 1161-1167 2017.

K. Compton S. Hauck "Automatic design of reconfigurable domain-specific flexible cores" IEEE Transactions on Very Large Scale Integration (VLSI) Systems vol. 16 no. 5 pp. 493-503 May 2008.

N. Binkert B. Beckmann G. Black et al. "The gem5 simulator" SIGARCH Comput. Archit. News vol. 39 no. 2 pp. 1-7 Aug. 2011.

S. Li J. H. Ahn R. D. Strong et al. "Mcpat: An integrated power area and timing modeling framework for multicore and manycore architectures" Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture ser. MICRO 42 pp. 469-480 2009.

Rtl synthesys [online] Available: https://www.synopsys.com/.

D. Rossi C. Mucci M. Pizzotti et al. "Multicore signal processing platform with heterogeneous configurable hardware accelerators" IEEE Transactions on Very Large Scale Integration (VLSI) Systems vol. 22 no. 9 pp. 1990-2003 2013.

M. Dehyadegari A. Marongiu M. R. Kakoee et al. "A tightly-coupled multi-core cluster with shared-memory hw accelerators" 2012 International Conference on Embedded Computer Systems (SAMOS) pp. 96-103 Jul. 2012.

S. Kamdar N. Kamdar "Big. little architecture: Heterogeneous multicore processing" International Journal of Computer Applications vol. 119 no. 1 2015.

A. Mishra A. K. Tripathi "Energy efficient voltage scheduling for multi-core processors with software controlled dynamic voltage scaling" Applied Mathematical Modelling vol. 38 no. 14 pp. 3456-3466 2014.

A. Butko F. Bruguier D. Novo et al. "Exploration of performance and energy trade-offs for heterogeneous multicore architectures" CoRR vol. abs/1902.02343 2019.

I. Koutras K. Maragos D. Diamantopoulos et al. "On supporting rapid prototyping of embedded systems with reconfigurable architectures" Integration vol. 58 pp. 91-100 2017.

F. Duhem F. Muller R. Bonamy et al. "Fortress: A flow for design space exploration of partially reconfigurable systems" Design Automation for Embedded Systems vol. 19 no. 3 pp. 301-326 2015.

J. D. Souza L. Carro M. B. Rutzig et al. "A reconfigurable heterogeneous multicore with a homogeneous isa" 2016 Design Automation Test in Europe Conference Exhibition (DATE) pp. 1598-1603 Mar. 2016.

M. A. Watkins D. H. Albonesi "Remap: A reconfigurable heterogeneous multicore architecture" 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture pp. 497-508 Dec. 2010.

D. Bouthaina M. Baklouti S. Niar et al. "Shared hardware accelerator architectures for heterogeneous mpsocs" 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC) pp. 1-6 Jul. 2013.

W. Hussain R. Airoldi H. Hoffmann et al. "Design of an accelerator-rich architecture by integrating multiple heterogeneous coarse grain reconfigurable arrays over a network-on-chip" 2014 IEEE 25th International Conference on Application-Specific Systems Architectures and Processors pp. 131-138 Jun. 2014.

R. Koenig L. Bauer T. Stripf et al. "Kahrisma: A novel hypermorphic reconfigurable-instruction-set multi-grained-array architecture" 2010 Design Automation Test in Europe Conference Exhibition (DATE 2010) pp. 819-824 Mar. 2010.

M. Brandalero A. C. S. Beck "A mechanism for energy-efficient reuse of decoding and scheduling of x86 instruction streams" Proceedings of the Conference on Design Automation & Test in Europe ser. DATE ‘17 pp. 1472-1477 2017.

R. Hartenstein "Coarse grain reconfigurable architecture (embedded tutorial)" Proceedings of the 2001 Asia and South Pacific Design Automation Conference ser. ASP-DAC ‘01 pp. 564-570 2001.

F. A. Endo D. Couroussé H.-P. Charles "Microarchitectural simulation of embedded core heterogeneity with gem5 and mcpat" Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools pp. 7 2015.

M. R. Guthaus J. S. Ringenberg D. Ernst et al. "Mibench: A free commercially representative embedded benchmark suite" Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538) pp. 3-14 Dec. 2001.
Publicado
19/11/2019
LOPES, Alba Sandyra Bezerra; BECK, Antonio Carlos; BRANDALERO, Marcelo; PEREIRA, Monica. Generating Optimized Multicore Accelerator Architectures. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SISTEMAS COMPUTACIONAIS (SBESC), 9. , 2019, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 25-32. ISSN 2237-5430.