Mechanism for Structuring the Data from a Generic Identity Document Image using Semantic Analysis

José C.  Gutiérrez; Rodolfo Valiente; Marcelo T. Sadaike; Daniel F.  Soriano; Graça Bressan; Wilson V. Ruggiero

José C. Gutiérrez USP
Rodolfo Valiente USP
Marcelo T. Sadaike USP
Daniel F. Soriano USP
Graça Bressan USP
Wilson V. Ruggiero USP

Resumo

Nowadays, the enormous variety of identity documents that exist makes it difficult to standardize a system capable of extracting all the information of interest presented by them. Therefore, systems that use templates to classify information based on their positions are limited by the number of templates they could recognize. Thus, in this paper, a novel mechanism intended to automatically classify the major information of interest exposed by generic identity documents is presented. The proposal is created to be easily adaptable to any system capable of detecting and extracting text information from an identity document image. To assign meaning to the text extracted from the identity document, the proposal is based on a novel mechanism to structuring the data using semantic analysis. The mechanism consists of two main steps, first, all the text data are classified as sentences or near sentences based on the Euclidean distance between words; second, the sentences are analyzed to find keywords that allow structuring the information based on its semantic to show it as abstractions. The proposal has been designed to be able to store the data as abstractions of its meaning. This allows improving the scalability of the system and a better use of this information by different services, by the end user or to be interpreted by an automated process of decisionmaking.

Mechanism for Structuring the Data from a Generic Identity Document Image using Semantic Analysis

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)