Efficient Sorting on the Tilera Manycore Architecture

  • Alessandro Morari Universitat Politecnica de Catalunya / Pacific Northwest National Laboratory / Barcelona Supercomputing Center
  • Antonino Tumeo Pacific Northwest National Laboratory
  • Oreste Villa Pacific Northwest National Laboratory
  • Simone Secchi Pacific Northwest National Laboratory
  • Mateo Valero Universitat Politecnica de Catalunya / Barcelona Supercomputing Center

Resumo


We present an efficient implementation of the radix sort algorithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is composed of 64 tiles interconnected through multiple fast Networks-on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor's sustained performance. We discuss the overall throughput reached by our radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implementations on x86 processors and graphic processing units.
Palavras-chave: Tiles, Histograms, Bandwidth, Computer architecture, Instruction sets, Sorting, Optimization
Publicado
24/10/2012
MORARI, Alessandro; TUMEO, Antonino; VILLA, Oreste; SECCHI, Simone; VALERO, Mateo. Efficient Sorting on the Tilera Manycore Architecture. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 24. , 2012, Nova Iorque/EUA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2012 . p. 171-178.