Using Optical Character Recognition to Extract Text from Newsroom Images
Abstract
Automatic Essay Scoring is a task in the area of Natural Language Processing, whose objective is to evaluate and score written prose texts. One of the main difficulties of this task is the lack of datasets of essays annotated with the value obtained in each competence. Thus, this work proposes an effective solution to capture essays written by students, through computer vision and optical character recognition techniques. This paper segments words from the image of the essay text and processes each word, then recognizes the text of each image. At the end, it orders all the words in the correct reading sequence, obtaining moderate performance.
References
Barros, S. C. B. (2019). Estudo do desempenho de candidatos à UFRN na prova de redação do Enem no período de 2013 a 2016. Dissertação de mestrado, Brasil.
INEP, E. (2021). Painéis Enem. [link].
Marinho, J. C., Anchieta, R. T., and Moura, R. S. (2021). Essay-br: a Brazilian corpus of essays. arXiv preprint arXiv:2105.09081.
Marinho, J. C., Cordeiro, F., Anchieta, R. T., and Moura, R. S. (2022). Automated essay scoring: An approach based on Enem competencies. In Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional, pages 49–60. SBC.
Marti, U.-V. and Bunke, H. (2002). The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5:39–46.
Parthiban, R., Ezhilarasi, R., and Saravanan, D. (2020). Optical character recognition for English handwritten text using recurrent neural network. In 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), pages 1–5. IEEE.
Scheidl, H., Fiel, S., and Sablatnig, R. (2018). Word beam search: A connectionist temporal classification decoding algorithm. In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 253–258. IEEE.
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., and Liang, J. (2017). EAST: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 5551–5560.
