Enhancing Graph Data Quality by Leveraging Heterogeneous Node Features and Embeddings
Resumo
Heterogeneous Graphs are important data sources due to their rich representation of knowledge, primarily based on node features and relationships. It is common for these graphs to have significant data gaps, particularly in the nodes. Graph Neural Networks are state-of-the-art solutions that achieve excellent results by extracting information based on node relationships. However, they suffer from severe limitations when there is no available information in the graph elements, weakening their representation. This paper proposes the specifications and an algorithm to process different types of node features, such as text, images, and subgraphs, generating both single and composition embeddings. To evaluate the effectiveness of the proposed algorithm, experiments were conducted to generate the features and their respective node embeddings in a Heterogeneous Graph. The achieved performance was measured using the average of Accuracy, F1-Score, and their Standard Deviations based on the Recommender System tasks applied to the embeddings generated in the experiments. We can highlight the performance achievement in the experiments as the Node Classification task, using the composition of Aggregated Features with Metapaths embedding, which achieved an F1-Score of 83.66% overcoming the 60.70% achieved by the approach without embeddings.
Publicado
17/11/2024
Como Citar
ANGONESE, Silvio Fernando; GALANTE, Renata.
Enhancing Graph Data Quality by Leveraging Heterogeneous Node Features and Embeddings. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 383-398.
ISSN 2643-6264.