SAPIVe: Simple AVX to PIM Vectorizer

Rodrigo M. Sokulski; Paulo C. Santos; Sairo  R. dos Santos; Marco A. Z. Alves

Rodrigo M. Sokulski Universidade Federal do Paraná https://orcid.org/0000-0002-0484-4003
Paulo C. Santos Instituto Federal do Rio Grande do Sul http://orcid.org/0000-0001-8555-2637
Sairo R. dos Santos Universidade Federal Rural do Semi-Árido https://orcid.org/0000-0001-9981-5231
Marco A. Z. Alves Universidade Federal do Paraná https://orcid.org/0000-0003-2440-2664

Resumo

Larger vector extensions are one of the commonly used techniques to meet the growing demands from computational systems. These extensions, capable of operating over multiple data elements with a single instruction, exert a lot of pressure on the memory hierarchy, increasing the impact of growing problems such as Memory-Wall and von Neumann bottleneck. An alternative to work around these problems would be adding processing elements close to the memory, known as Processing-In-Memory (PIM). As with processor vector extensions, the most efficient PIM techniques use in-memory vector processing units. There are several ways to convert a code into in-memory vector processing, such as binary hardware translation, which may not depend on programmers or adapted software and can be carried out transparently to its users. However, in the context of in-memory processing, this conversion technique presents some challenges related to the PIM instructions format and the structure of the loops present in each application. Thus, this article proposes and evaluates Simple AVX to PIM Vectorizer (SAPIVe), a hardware binary translation mechanism from processor vector instructions into in-memory vector instructions, which, in addition to processing more data, also performs loads, operations, and stores at once. Our results show that our mechanism can accelerate kernels up to 5 times with possible performance losses prevented using loop predictors.

Palavras-chave: processing-in-memory, hardware, binary, translator, vectorizer