Enhancing Programmability in NoC-Based Lightweight Manycore Processors with a Portable MPI Library

  • João Fellipe Uller UFSC
  • João Vicente Souto UFSC
  • Pedro Henrique Penna PUC Minas / Université Grenoble Alpes
  • Márcio Castro UFSC
  • Henrique Freitas PUC Minas
  • Jean-François Méhaut Université Grenoble Alpes


The performance and energy efficiency provided by lightweight manycores is undeniable. However, the lack of rich and portable support for these processors makes software development challenging. To address this problem, we propose a portable and lightweight MPI library (LWMPI) designed from scratch to cope with restrictions and intricacies of lightweight manycores. We integrated LWMPI into a distributed OS that targets these processors and evaluated it on the Kalray MPPA-256 processor. Results obtained with three applications from a representative benchmark suite unveiled that LWMPI achieves similar performance scalability in comparison with the low-level vendor-specific API narrowed for MPPA-256, while exposing a richer programming interface.


Asmussen, N. et al. (2016). M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores. In International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '16, pages 189–203, Atlanta, Georgia. ACM.

Baumann, A. et al. (2009). The Multikernel: A New OS Architecture for Scalable Multicore Systems. In ACM SIGOPS Symposium on Operating Systems Principles, SOSP '09, pages 29–44, Big Sky, Montana. ACM.

Boyd-Wickizer, S. et al. (2010). An analysis of linux scalability to many cores. In USENIX Conference on Operating Systems Design and Implementation, OSDI '10, pages 1–16, Vancouver, Canada.

Clauss, C. et al. (2011). Evaluation and improvements of programming models for the In International Conference on High Performance Intel SCC many-core processor. Computing & Simulation (HPCS), pages 525–532. IEEE.

de Dinechin, B. D. et al. (2013a). A Clustered Manycore Processor Architecture for Embedded and Accelerated Applications. In IEEE High Performance Extreme Computing Conference, HPEC '13, pages 1–6, Waltham, USA. IEEE.

de Dinechin, B. D. et al. (2013b). A distributed run-time environment for the kalray mppa-256 integrated manycore processor. Procedia Computer Science, 18(International Conference on Computational Science):1654–1663.

Francesquini, E. et al. (2015). On the Energy Efciency and Performance of Irregular Application Executions on Multicore, NUMA and Manycore Platforms. Journal of Parallel and Distributed Computing (JPDC), 76(C):32–48.

Fu, H. et al. (2016). The Sunway TaihuLight Supercomputer: System and Applications. Science China Information Sciences, 59(7):072001–0720016.

Gamell, M. et al. (2012). Exploring cross-layer power management for PGAS applications on the SCC platform. In International Symposium on High-Performance Parallel and Distributed Computing (HPDC), page 235, New York, USA. ACM Press.

Haghbayan, M.-H. et al. (2017). Performance/reliability-aware resource management for many-cores in dark silicon era. IEEE Transactions on Computers (TC), 66(9):1599– 1612.

Hascoët, J. et al. (2017). Asynchronous One-Sided Communications and Synchronizations for a Clustered Manycore Processor. In Symposium on Embedded Systems for Real-Time Multimedia, ESTIMedia '17, pages 51–60, Seoul. ACM Press.

Ho, M. Q. et al. (2015). MPI communication on MPPA many-core NoC: Design, modIn International Conference on Parallel Computing, eling and performance issues. volume 27 of ParCo '2015, pages 113–122, Edinburgh, UK. IOS Press.

Kelly, B. et al. (2013). Autopilot: Message passing parallel programming for a cache incoherent embedded manycore processor. In International Workshop on Many-Core Embedded Systems, MES '13, page 62–65, New York, NY, USA. Association for Computing Machinery.

Kluge, F. et al. (2014). An Operating System for Safety-Critical Applications on Manycore Processors. In International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC '14, pages 238–245, Reno, Nevada. IEEE.

MPICH (2020). Mpich: High-performance portable mpi.

Nightingale, E. B. et al. (2009). Helios: Heterogeneous Multiprocessing with Satellite Kernels. In ACM SIGOPS Symposium on Operating Systems Principles, SOSP '09, pages 221–234, Big Sky, Montana. ACM Press.

Olofsson, A. (2016). Epiphany-v: A 1024 processor 64-bit risc system-on-chip. ArXiv, 1610.01832:1–15.

Penna, P. H. et al. (2019). On the Performance and Isolation of Asymmetric Microkernel Design for Lightweight Manycores. In Brazilian Symposium on Computing Systems Engineering, SBESC '19, pages 1–8, Natal, Brazil.

Richie, D. et al. (2017). A Distributed Shared Memory Model and C++ Templated MetaProgramming Interface for the Epiphany RISC Array Processor. Procedia Computer Science, 108:1093–1102.

Ross, J. and Richie, D. (2016). Implementing openshmem for the adapteva epiphany risc array processor. Procedia Computer Science, 80(C):2353–2356.

Serres, O. et al. (2011). Experiences with UPC on TILE-64 processor. In Aerospace Conference, pages 1–9. IEEE.

Souto, J. V. et al. (2020). Mecanismos de comunicação entre clusters para lightweight In Escola Regional de Alto Desempenho da Região Sul, manycores no nanvix os. ERAD/RS '20, pages 1–4, Porto Alegre, RS, Brasil. SBC.

Souza, M. et al. (2017). Cap bench: A benchmark suite for performance and energy evaluation of low-power many-core processors. Concurrency and Computation: Practice and Experience (CCPE), 29(4):1–18.

SPI (2020). Open mpi: Open source high performance computing.

van der Wijngaart, R. F. et al. (2011). Light-weight communications on intel's single-chip cloud computer processor. SIGOPS Operating Systems Review (OSR), 45(1):73–83.

Varghese, A. et al. (2014). Programming the adapteva epiphany 64-core network-onIn International Parallel and Distributed Processing Symposium chip coprocessor. Workshops (IPDPSW), IPDPSW '14, pages 984–992, Phoenix, USA. IEEE.

Wallentowitz, S. et al. (2012). A Framework for Open Tiled Manycore System-OnChip. In International Conference on Field Programmable Logic and Applications, FPL '2012, pages 535–538, Oslo. IEEE.

Wentzlaff, D. and Agarwal, A. (2009). Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43(2):76–85.
Como Citar

Selecione um Formato
ULLER, João Fellipe; SOUTO, João Vicente; PENNA, Pedro Henrique; CASTRO, Márcio; FREITAS, Henrique; MÉHAUT, Jean-François. Enhancing Programmability in NoC-Based Lightweight Manycore Processors with a Portable MPI Library. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (WSCAD), 21. , 2020, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 155-166. DOI: https://doi.org/10.5753/wscad.2020.14066.