Gerenciamento de Duplicatas usando SGBD Orientado a Grafos: Um Estudo de Caso
Resumo
The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.
Palavras-chave:
Graph Databases, Data Integration and Cleaning, Data Similarity, NoSQL, Fraud Detection Systems
Publicado
20/05/2019
Como Citar
VAZ, Robinson Vespucio; DE OLIVEIRA, Jones Dhyemison Quito; RIBEIRO, Leonardo Andrade.
Gerenciamento de Duplicatas usando SGBD Orientado a Grafos: Um Estudo de Caso. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 15. , 2019, Aracajú.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2019
.
p. 391-398.