Mechanism for Structuring the Data from a Generic Identity Document Image using Semantic Analysis

  • José C. Gutiérrez USP
  • Rodolfo Valiente USP
  • Marcelo T. Sadaike USP
  • Daniel F. Soriano USP
  • Graça Bressan USP
  • Wilson V. Ruggiero USP


Nowadays, the enormous variety of identity documents that exist makes it difficult to standardize a system capable of extracting all the information of interest presented by them. Therefore, systems that use templates to classify information based on their positions are limited by the number of templates they could recognize. Thus, in this paper, a novel mechanism intended to automatically classify the major information of interest exposed by generic identity documents is presented. The proposal is created to be easily adaptable to any system capable of detecting and extracting text information from an identity document image. To assign meaning to the text extracted from the identity document, the proposal is based on a novel mechanism to structuring the data using semantic analysis. The mechanism consists of two main steps, first, all the text data are classified as sentences or near sentences based on the Euclidean distance between words; second, the sentences are analyzed to find keywords that allow structuring the information based on its semantic to show it as abstractions. The proposal has been designed to be able to store the data as abstractions of its meaning. This allows improving the scalability of the system and a better use of this information by different services, by the end user or to be interpreted by an automated process of decisionmaking.
Como Citar

Selecione um Formato
GUTIÉRREZ, José C. ; VALIENTE, Rodolfo; SADAIKE, Marcelo T.; SORIANO, Daniel F. ; BRESSAN, Graça; RUGGIERO, Wilson V.. Mechanism for Structuring the Data from a Generic Identity Document Image using Semantic Analysis. In: ANAIS PRINCIPAIS DO SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 23. , 2017, Gramado. Anais Principais do XXIII Simpósio Brasileiro de Sistemas Multimídia e Web. Porto Alegre: Sociedade Brasileira de Computação, oct. 2017 . p. 213-216.

Artigos mais lidos do(s) mesmo(s) autor(es)