Graphs Based on IR as Representation of Code: Types and Insights

Anderson Faustino

Anderson Faustino UEM

Resumo

Mainstream compilers infer code properties from data structures, such as trees and graphs. The latter is useful to represent the control flow and the data dependencies in a code. In addition, graphs can also be used in learning tasks, such as classifying applications given their raw code, predicting the best-performing compute device (e.g., CPU, GPU) or predicting the optimal thread coarsening factor. This paper investigates the performance of graph neural networks on classifying applications given their raw code, for different type of graphs extracted from LLVM’s intermediate representation. The results indicate that adding new (different) edges and/or nodes is not a fact of performance improvement. This paper shows a compact representation tends to achieve the best performance. As a result of such investigation, this paper has three main contributions: (1) an infrastructure to explore such graphs on different tasks, (2) compact graphs from LLVM’s intermediate representation, (3) and a detailed evaluation of different types of graphs on a learning task.

Palavras-chave: Program Reasoning, Compilers, Graph, LLVM, Neural Network