Enhancing Text Recognition in OCR Systems Through Image Processing with BSRGAN

Resumo


Context: Image enhancement is essential for advancing Optical Character Recognition (OCR), a technology widely applied across various Information Systems (IS) to enable accurate text extraction from scanned documents, IDs, invoices, and other document types. Problem: Despite OCR’s importance, challenges such as noise, variable illumination, and low-resolution scans often compromise recognition quality, leading to distorted and inaccurate results. These issues can impact the reliability and effectiveness of IS. Solution: This study presents a methodology to improve the quality of low-resolution images by combining image filtering techniques with OpenCV, super-resolution using the BSRGAN model, and EasyOCR for character extraction. IS Theory: The research is anchored in the Information Quality theory in IS, addressing the importance of improving input data to enhance system outputs and reliability. Method: The proposed methodology consists of two main stages. First, low-resolution images are processed using the BSRGAN super-resolution model, which enhances image quality for improved OCR performance. Then, the enhanced images are processed by an OCR system to extract and convert characters into text. Validation was conducted on three datasets: Brazilian Identity Document (BID), IIIT 5K-Word, and SVHN, simulating real-world application conditions. Summary of Results: The results demonstrate the proposed methodology’s effectiveness in enhancing OCR accuracy, significantly reducing error rates in various contexts. Contributions and Impact on IS: This work contributes to the IS field by providing a solution that enhances OCR input quality, benefiting academia through advanced image processing research and the industry by enabling more reliable text recognition in practical applications.

Palavras-chave: Optical Character Recognition, Image Enhancement, Improved Character Recognition, Deep Learning, Information Retrieve

Referências

Manoela Auad, Sarah Alves, Gabriel Kakizaki, Julio Reis, and Michel Silva. 2024. A Filtering and Image Preparation Approach to Enhance OCR for Fiscal Receipts. In Anais da XXXVII Conference on Graphics, Patterns and Images (Manaus/AM). SBC, Porto Alegre, RS, Brasil. [link]

Ninad Awalgaonkar, Prashant Bartakke, and Ravindra Chaugule. 2021. Automatic license plate recognition system using SSD. In 2021 international symposium of Asian control association on intelligent robotics and industrial automation (IRIA). IEEE, 394–399.

Youngmin Baek et al. 2019. Character region awareness for text detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

Shivani Bansal, Meenu Gupta, and Amit Kumar Tyagi. 2020. Building a character recognition system for vehicle applications. In Advances in Decision Sciences, Image Processing, Security and Computer Vision: International Conference on Emerging Trends in Engineering (ICETE). Vol. 1. Springer International Publishing.

F. Baptistella. [n. d.]. FernandoBLima/ImageEnhance-TextRec-OCR. [link]. Accessed on 1 jul. 2024.

Yu Binbin. 2019. An improved infrared image processing method based on adaptive threshold denoising. EURASIP Journal on Image and Video Processing 2019, 1 (2019), 5.

Y. L. Chaitra et al. 2023. Text Detection and Recognition from the Scene Images Using RCNN and EasyOCR. In International Conference on Information and Communication Technology for Intelligent Systems. Springer Nature Singapore.

Antonia Creswell et al. 2018. Generative adversarial networks: An overview. IEEE signal processing magazine 35, 1 (2018), 53–65.

Lucas Lima de Oliveira and Viviane P. Moreira. 2024. Creating Resources and Evaluating the Impact of OCR Quality on Information Retrieval: A Case Study in the Geoscientific Domain. In Anais Estendidos do XXXIX Simpósio Brasileiro de Bancos de Dados (Florianópolis/SC). SBC, Porto Alegre, RS, Brasil, 202–206. DOI: 10.5753/sbbd_estendido.2024.241190

Lucas Lima de Oliveira, Danny Suarez Vargas, Antônio Marcelo Azevedo Alexandre, Fábio Corrêa Cordeiro, Diogo da Silva Magalhães Gomes, Max de Castro Rodrigues, Regis Kruel Romeu, and Viviane Pereira Moreira. 2023. Evaluating and mitigating the impact of OCR errors on information retrieval. International Journal on Digital Libraries 24, 1 (2023), 45–62.

Manuel Fritsche, Shuhang Gu, and Radu Timofte. 2019. Frequency separation for real-world super-resolution. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 3599–3608.

Estevão S Gedraite and Murielle Hadad. 2011. Investigation on the effect of a Gaussian Blur in image filtering and segmentation. In Proceedings ELMAR-2011. IEEE, 393–396.

A. Gupta, A. Vedaldi, and A. Zisserman. 2016. Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2315–2324.

Gajanand Gupta et al. 2011. Algorithm for image processing using improved median filter and comparison of mean, median and improved median filter. International Journal of Soft Computing and Engineering (IJSCE) 1, 5 (2011), 304–311.

Pan He et al. 2016. Reading scene text in deep convolutional sequences. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.

Joseph Howse. 2013. OpenCV computer vision with python. Vol. 27. Packt Publishing Birmingham, UK.

Parth Hasmukh Jain et al. 2023. Artificially intelligent readers: an adaptive framework for original handwritten numerical digits recognition with OCR Methods. Information 14, 6 (2023), 305.

Xiaozhong Ji, Yun Cao, Ying Tai, Chengjie Wang, Jilin Li, and Feiyue Huang. 2020. Real-world super-resolution via kernel estimation and noise injection. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 466–467.

Dimosthenis Karatzas et al. 2013. ICDAR 2013 robust reading competition. In 2013 12th international conference on document analysis and recognition. IEEE.

Dietrich Klakow and Jochen Peters. 2002. Testing the correlation of word error rate and perplexity. Speech Communication 38, 1-2 (2002), 19–28.

Christian Ledig et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition.

Christian Ledig et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681–4690.

Xiaoming Li, Wangmeng Zuo, and Chen Change Loy. 2023. Learning generative structure prior for blind text image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

J. Liang, D. Doermann, and H. Li. 2005. Camera-based analysis of text and documents: a survey. International Journal of Document Analysis and Recognition (IJDAR) 7 (2005), 84–104.

Anand Mishra, Karteek Alahari, and C. V. Jawahar. 2012. Scene text recognition using higher order language priors. In BMVC-British machine vision conference. BMVA.

Andrew Morris. 2002. An information theoretic measure of sequence recognition performance. IDIAP.

Andrew Cameron Morris, Viktoria Maier, and Phil Green. 2004. From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. In Eighth International Conference on Spoken Language Processing.

Yuval Netzer et al. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, Vol. 2011.

Clemens Neudecker et al. 2021. A survey of OCR evaluation tools and metrics. In Proceedings of the 6th International Workshop on Historical Document Imaging and Processing. 13–18.

Aishik Rakshit, Samyak Mehta, and Anirban Dasgupta. 2023. A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing. In 2023 IEEE Guwahati Subsection Conference (GCON). IEEE.

Prasanna K. Sahoo, S. A. K. C. Soltani, and Andrew KC Wong. 1988. A survey of thresholding techniques. Computer vision, graphics, and image processing 41, 2 (1988), 233–260.

Joan Andreu Sanchez et al. 2017. ICDAR2017 competition on handwritten text recognition on the READ dataset. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR), Vol. 1. IEEE.

Ray Smith. 2007. An overview of the Tesseract OCR engine. In Ninth international conference on document analysis and recognition (ICDAR 2007), Vol. 2. IEEE.

XintaoWang et al. 2018. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops.

Xiangyang Xu et al. 2011. Characteristic analysis of Otsu threshold and its applications. Pattern recognition letters 32, 7 (2011), 956–961.

Qingxiong Yang, Kar-Han Tan, and Narendra Ahuja. 2012. Shadow removal using bilateral filtering. IEEE Transactions on Image processing 21, 10 (2012), 4361–4368.

Kai Zhang et al. 2021. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4791–4800.

Kai Zhang et al. 2021. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4791–4800.
Publicado
19/05/2025
LIMA, Fernando Baptistella de; SILVA, Eraylson Galdino da. Enhancing Text Recognition in OCR Systems Through Image Processing with BSRGAN. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 21. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 497-505. DOI: https://doi.org/10.5753/sbsi.2025.246551.