Gerenciamento de Duplicatas usando SGBD Orientado a Grafos: Um Estudo de Caso

Robinson Vespucio Vaz; Jones Dhyemison Quito de Oliveira; Leonardo Andrade Ribeiro

Robinson Vespucio Vaz Controladoria Geral do Estado de Goiás
Jones Dhyemison Quito de Oliveira Universidade Federal de Goiás
Leonardo Andrade Ribeiro Universidade Federal de Goiás

Resumo

The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.

Palavras-chave: Graph Databases, Data Integration and Cleaning, Data Similarity, NoSQL, Fraud Detection Systems

Gerenciamento de Duplicatas usando SGBD Orientado a Grafos: Um Estudo de Caso

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)