Efficient Sorting on the Tilera Manycore Architecture
Resumo
We present an efficient implementation of the radix sort algorithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is composed of 64 tiles interconnected through multiple fast Networks-on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor's sustained performance. We discuss the overall throughput reached by our radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implementations on x86 processors and graphic processing units.
Palavras-chave:
Tiles, Histograms, Bandwidth, Computer architecture, Instruction sets, Sorting, Optimization
Publicado
24/10/2012
Como Citar
MORARI, Alessandro; TUMEO, Antonino; VILLA, Oreste; SECCHI, Simone; VALERO, Mateo.
Efficient Sorting on the Tilera Manycore Architecture. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 24. , 2012, Nova Iorque/EUA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2012
.
p. 171-178.
