Evaluation of Optical Character Recognition (OCR) Systems Dealing with Misinformation in Portuguese
Resumo
The performance of OCR techniques is highly dependent on the application context and the language being processed. Studies focused on languages such as Pt- Br and specific contexts are still scarce. Thus, in this work, we present an extensive analysis of the performance of OCR systems, specifically in the Brazilian Portuguese language, in the context of detecting misinformation spread through images on social platforms. To do this, we build a synthetic dataset considering texts from a Pt- Br fact-check labeled data and common patterns of images frequently shared on social media and messaging apps. Our results reveal the influence of analyzed image aspects on OCR accuracy highlighting those with the greatest impact. Further, we report a considerable variation among the evaluated OCR systems in terms of performance.
Publicado
06/11/2023
Como Citar
SANTOS, Yago; SILVA, Michel; REIS, Julio C. S..
Evaluation of Optical Character Recognition (OCR) Systems Dealing with Misinformation in Portuguese. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 36. , 2023, Rio Grande/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 223-228.