Mechanism for Structuring the Data from a Generic Identity Document Image using Semantic Analysis

  • José C. Gutiérrez USP
  • Rodolfo Valiente USP
  • Marcelo T. Sadaike USP
  • Daniel F. Soriano USP
  • Graça Bressan USP
  • Wilson V. Ruggiero USP

Resumo

Nowadays, the enormous variety of identity documents that exist makes it difficult to standardize a system capable of extracting all the information of interest presented by them. Therefore, systems that use templates to classify information based on their positions are limited by the number of templates they could recognize. Thus, in this paper, a novel mechanism intended to automatically classify the major information of interest exposed by generic identity documents is presented. The proposal is created to be easily adaptable to any system capable of detecting and extracting text information from an identity document image. To assign meaning to the text extracted from the identity document, the proposal is based on a novel mechanism to structuring the data using semantic analysis. The mechanism consists of two main steps, first, all the text data are classified as sentences or near sentences based on the Euclidean distance between words; second, the sentences are analyzed to find keywords that allow structuring the information based on its semantic to show it as abstractions. The proposal has been designed to be able to store the data as abstractions of its meaning. This allows improving the scalability of the system and a better use of this information by different services, by the end user or to be interpreted by an automated process of decisionmaking.
Publicado
2017-10-17
Como Citar
GUTIÉRREZ, José C. et al. Mechanism for Structuring the Data from a Generic Identity Document Image using Semantic Analysis. Anais do Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia), [S.l.], p. 213-216, out. 2017. Disponível em: <https://sol.sbc.org.br/index.php/webmedia/article/view/5285>. Acesso em: 14 maio 2024.

##plugins.generic.recommendByAuthor.heading##