Comparative Evaluation of Text Recognition Methods for Field Extraction in Brazilian National Driver’s Licenses
Abstract
With the advancement of digitalization, the demand for document information extraction solutions has grown, but studies focusing on Brazilian Driver’s Licenses (CNH) remain scarce. This paper highlights this gap by evaluating OCR methods, including TrOCR (with varying settings), Tesseract, and GPT models, for extracting information from CNH documents. The study considers challenges such as lighting and variable image quality, using metrics such as error rate and recognition evaluation in specific fields. The results demonstrate promising performance using open source tools, offering insights into the advantages and limitations of each model.References
Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., and Manmatha, R. (2021). Docformer: End-to-end transformer for document understanding. In Proceedings of the IEEE/CVF international conference on computer vision, pages 993–1003.
Attivissimo, F., Giaquinto, N., Scarpetta, M., and Spadavecchia, M. (2019). An automatic reader of identity documents. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pages 3525–3530. IEEE.
Bao, H., Dong, L., Piao, S., and Wei, F. (2021). BEiT: BERT Pre-Training of Image Transformers. arXiv preprint arXiv:2106.08254.
Baviskar, D., Ahirrao, S., Potdar, V., and Kotecha, K. (2021). Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions. IEEE Access, 9:72894–72936.
Carta, S., Giuliani, A., Piano, L., and Tiddia, S. G. (2024). An end-to-end ocr-free solution for identity document information extraction. Procedia Computer Science, 246:453–462. 28th International Conference on Knowledge Based and Intelligent information and Engineering Systems (KES 2024).
Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., and Ochoa, M. (2020). Machine learning techniques for identity document verification in uncontrolled environments: A case study. In Pattern Recognition, pages 271–281, Cham. Springer International Publishing.
Chandra, A. and Stefanus, R. (2021). An end-to-end optical character recognition pipeline for indonesian identity card. In 2021 9th International Conference on Information and Communication Technology (ICoICT), pages 307–312.
de Sá Soares, A., das Neves Junior, R. B., and Bezerra, B. L. D. (2020). BID Dataset: a challenge dataset for document processing tasks. In Anais Estendidos do XXXIII Conference on Graphics, Patterns and Images, pages 143–146. SBC.
Gov, A. (2024). E-commerce no brasil cresce 4% e alcança r$ 196 bi em 2023. [link] Accessed: December 9, 2024.
Hoai, D. P. V., Duong, H.-T., and Hoang, V. T. (2021). Text recognition for vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition (IJDAR), 24(2):123–131.
Hochreiter, S. and Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8):1735–1780.
Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., and Jawahar, C. V. (2021). ICDAR2019 competition on scanned receipt OCR and information extraction. CoRR, abs/2103.10213.
Jitsi (2024). Jiwer: A python package for word error rate and character error rate computation. [link] Accessed: December 6, 2024.
Jocher, G., Chaurasia, A., and Qiu, J. (2023). Ultralytics yolov8. [link] Accessed: November 20, 2024.
Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., and Wei, F. (2023). Trocr: Transformer-based optical character recognition with pre-trained models. In The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23).
Li, Y., Qian, Y., Yu, Y., Qin, X., Zhang, C., Liu, Y., Yao, K., Han, J., Liu, J., and Ding, E. (2021). Structext: Structured text understanding with multi-modal transformers. In Proceedings of the 29th ACM international conference on multimedia, pages 1912–1920.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692, 364.
Neudecker, C., Baierer, K., Gerber, M., Clausner, C., Antonacopoulos, A., and Pletschacher, S. (2021). A survey of ocr evaluation tools and metrics. In Proceedings of the 6th International Workshop on Historical Document Imaging and Processing, HIP ’21, page 13–18, New York, NY, USA. Association for Computing Machinery.
Planalto (2018). Lei nº 13.709, de 14 de agosto de 2018. [link] Accessed: November 25, 2024.
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al. (2018). Improving language understanding by generative pre-training.
Redmon, J. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Smith, R. (2007). An overview of the tesseract ocr engine. In Ninth international conference on document analysis and recognition (ICDAR 2007), volume 2, pages 629–633. IEEE.
Subramani, N., Matton, A., Greaves, M., and Lam, A. (2021). A survey of deep learning approaches for ocr and document understanding.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, pages 5998–6008.
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., and Zhou, M. (2020). MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Advances in Neural Information Processing Systems, 33:5776–5788.
Wojcik, L., Coelho, L., Granada, R., Führ, G., and Menotti, D. (2023). NBID dataset: Towards robust information extraction in official documents. In 2023 36th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 145–150.
Yu, W., Lu, N., Qi, X., Gong, P., and Xiao, R. (2021). Pick: processing key information extraction from documents using improved graph learning-convolutional networks. In 2020 25th International conference on pattern recognition (ICPR), pages 4363–4370. IEEE.
Attivissimo, F., Giaquinto, N., Scarpetta, M., and Spadavecchia, M. (2019). An automatic reader of identity documents. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pages 3525–3530. IEEE.
Bao, H., Dong, L., Piao, S., and Wei, F. (2021). BEiT: BERT Pre-Training of Image Transformers. arXiv preprint arXiv:2106.08254.
Baviskar, D., Ahirrao, S., Potdar, V., and Kotecha, K. (2021). Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions. IEEE Access, 9:72894–72936.
Carta, S., Giuliani, A., Piano, L., and Tiddia, S. G. (2024). An end-to-end ocr-free solution for identity document information extraction. Procedia Computer Science, 246:453–462. 28th International Conference on Knowledge Based and Intelligent information and Engineering Systems (KES 2024).
Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., and Ochoa, M. (2020). Machine learning techniques for identity document verification in uncontrolled environments: A case study. In Pattern Recognition, pages 271–281, Cham. Springer International Publishing.
Chandra, A. and Stefanus, R. (2021). An end-to-end optical character recognition pipeline for indonesian identity card. In 2021 9th International Conference on Information and Communication Technology (ICoICT), pages 307–312.
de Sá Soares, A., das Neves Junior, R. B., and Bezerra, B. L. D. (2020). BID Dataset: a challenge dataset for document processing tasks. In Anais Estendidos do XXXIII Conference on Graphics, Patterns and Images, pages 143–146. SBC.
Gov, A. (2024). E-commerce no brasil cresce 4% e alcança r$ 196 bi em 2023. [link] Accessed: December 9, 2024.
Hoai, D. P. V., Duong, H.-T., and Hoang, V. T. (2021). Text recognition for vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition (IJDAR), 24(2):123–131.
Hochreiter, S. and Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8):1735–1780.
Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., and Jawahar, C. V. (2021). ICDAR2019 competition on scanned receipt OCR and information extraction. CoRR, abs/2103.10213.
Jitsi (2024). Jiwer: A python package for word error rate and character error rate computation. [link] Accessed: December 6, 2024.
Jocher, G., Chaurasia, A., and Qiu, J. (2023). Ultralytics yolov8. [link] Accessed: November 20, 2024.
Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., and Wei, F. (2023). Trocr: Transformer-based optical character recognition with pre-trained models. In The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23).
Li, Y., Qian, Y., Yu, Y., Qin, X., Zhang, C., Liu, Y., Yao, K., Han, J., Liu, J., and Ding, E. (2021). Structext: Structured text understanding with multi-modal transformers. In Proceedings of the 29th ACM international conference on multimedia, pages 1912–1920.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692, 364.
Neudecker, C., Baierer, K., Gerber, M., Clausner, C., Antonacopoulos, A., and Pletschacher, S. (2021). A survey of ocr evaluation tools and metrics. In Proceedings of the 6th International Workshop on Historical Document Imaging and Processing, HIP ’21, page 13–18, New York, NY, USA. Association for Computing Machinery.
Planalto (2018). Lei nº 13.709, de 14 de agosto de 2018. [link] Accessed: November 25, 2024.
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al. (2018). Improving language understanding by generative pre-training.
Redmon, J. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Smith, R. (2007). An overview of the tesseract ocr engine. In Ninth international conference on document analysis and recognition (ICDAR 2007), volume 2, pages 629–633. IEEE.
Subramani, N., Matton, A., Greaves, M., and Lam, A. (2021). A survey of deep learning approaches for ocr and document understanding.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, pages 5998–6008.
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., and Zhou, M. (2020). MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Advances in Neural Information Processing Systems, 33:5776–5788.
Wojcik, L., Coelho, L., Granada, R., Führ, G., and Menotti, D. (2023). NBID dataset: Towards robust information extraction in official documents. In 2023 36th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 145–150.
Yu, W., Lu, N., Qi, X., Gong, P., and Xiao, R. (2021). Pick: processing key information extraction from documents using improved graph learning-convolutional networks. In 2020 25th International conference on pattern recognition (ICPR), pages 4363–4370. IEEE.
Published
2025-07-20
How to Cite
MATTOS, Suziane L. R. R.; BOLDT, Francisco de A.; PAIXÃO, Thiago M..
Comparative Evaluation of Text Recognition Methods for Field Extraction in Brazilian National Driver’s Licenses. In: INTEGRATED SOFTWARE AND HARDWARE SEMINAR (SEMISH), 52. , 2025, Maceió/AL.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 97-108.
ISSN 2595-6205.
DOI: https://doi.org/10.5753/semish.2025.7349.
