A Form Understanding Approach to Printed and Structured Engineering Documentation
Resumo
A significant amount of companies still depends on printed documents, such as healthcare reports, engineering specifications, or historical documents. Those documents are diverse in terms of layout and content, thereby it requires different approaches for each document structure, which makes information extraction a costly and inefficient task. We classify documents into three categories, non-structured, semi-structured, and structured documents. The last one being the focus of the present work. We propose a pattern recognition method for structured documents with an anchoring relationship between question-answer objects through a system of hypotheses and a probability distribution in order to identify which predefined model the document belongs to. Therefore, acting as a system for both identification and content extraction to structured documents. The method has promising results for pattern recognition from all document models, with 78% to 97% objects extracted correctly.
Palavras-chave:
Layout, Medical services, Documentation, Companies, Information retrieval, Probability distribution, Pattern recognition, Form Understanding, Text Detection, Spatial Layout Analysis
Publicado
18/10/2021
Como Citar
SANTOS, Gabriel L.; SILVA, Vanessa T.; DALMOLIN, Laura A.; RODRIGUES, Ricardo N.; DREWS, Paulo L. J.; DUARTE FILHO, Nelson L..
A Form Understanding Approach to Printed and Structured Engineering Documentation. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 34. , 2021, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.