Towards Zero-Shot Document Image Classification
Resumo
Classification is a fundamental tool to automate the process of categorizing documents in many real-world applications, such as information management, financial document processing, healthcare records management, news categorization, fraud detection, regulatory compliance, and many others. Because of this broad spectrum of applications, document classification is of paramount importance for various companies. However, documents often change in terms of format and their visual patterns, which may impair a simple classification model. Moreover, model continuance and retraining often demands important efforts, consuming computational resources and demanding new data. Therefore, techniques capable of classifying documents by simply observing new data, without necessarily requiring retraining the classifier, are of immense importance for a wide variety of applications. In this context, Zero-Shot Learning (ZSL) is especially suitable for document classification because it handles diverse and ever-changing document content. In this work, we tackle the gap involving Zero-Shot Document Image Classification (ZS-DIC), where we classify documents that have not been seen by the model during training. To achieve this, we built Layout-Aware Complex Document Information Processing (LA-CDIP), a dataset tailored for this problem. LAC-DIP prioritizes structural consistency, allowing models to classify documents correctly under a ZSL scenario. To benchmark this dataset, we developed a series of Siamese Neural Networks (NNs) based on a variety of computer vision neural architectures, such as ResNet, EfficientNet, ViT and others. As a result, the proposed ZSL-based method achieves Equal Error Rates (EERs) under 5%. The code of the proposed method is available at https://github.com/ABMHub/doc-zsl.
