Evaluation of Optical Character Recognition (OCR) Systems Dealing with Misinformation in Portuguese

  • Yago Santos UFV
  • Michel Silva UFV
  • Julio C. S. Reis UFV

Resumo


The performance of OCR techniques is highly dependent on the application context and the language being processed. Studies focused on languages such as Pt- Br and specific contexts are still scarce. Thus, in this work, we present an extensive analysis of the performance of OCR systems, specifically in the Brazilian Portuguese language, in the context of detecting misinformation spread through images on social platforms. To do this, we build a synthetic dataset considering texts from a Pt- Br fact-check labeled data and common patterns of images frequently shared on social media and messaging apps. Our results reveal the influence of analyzed image aspects on OCR accuracy highlighting those with the greatest impact. Further, we report a considerable variation among the evaluated OCR systems in terms of performance.
Publicado
06/11/2023
SANTOS, Yago; SILVA, Michel; REIS, Julio C. S.. Evaluation of Optical Character Recognition (OCR) Systems Dealing with Misinformation in Portuguese. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 36. , 2023, Rio Grande/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 223-228.