A Importância dos Falsos Homógrafos para a Correção Automática de Erros Ortográficos em Português

  • Magali Sanches Duran USP
  • Lucas Vinícius Avanço USP
  • Maria das Graças Volpe Nunes USP

Abstract


This paper reports the analysis of 25.722 pairs of Portuguese words that differ from each other by a single diacritic, called “false homographs”. Such words are relevant for spelling correction, as in these cases a misspelled word missing a diacritic is identical to a correct word, consequently preventing the identification and the correction of the misspelling. The purpose of the analysis is to identify and to exclude, from the lexicon used by a Portuguese speller, non-accented words that are relatively less frequent than their respective accented pairs. This action is specially justified when one aims to correct User-Generated Content (UGC), a kind of text characterized by missing diacritics, among other features. The result is a list of 2.052 words that fit the requirements of the aimed strategy.

Published
2015-11-04
DURAN, Magali Sanches; AVANÇO, Lucas Vinícius; NUNES, Maria das Graças Volpe. A Importância dos Falsos Homógrafos para a Correção Automática de Erros Ortográficos em Português. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 1. , 2015, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2015 . p. 265-273.