Normalizador de Texto para Língua Portuguesa baseado em Modelo de Linguagem
Abstract
Automatic processing of user-generated content on the Internet is a major challenge. Informal writing is one reason for this difficulty. This informality motivated the research on methods for text normalization. Text normalization is a step that precedes the usual processing, converting the text from user into a ’standard’ (more formal) writing format. In this work, we prototype a normalizer for the Portuguese Language that is based on language model. In this approach, we use the machine translation technique to normalize the texts. We tested our normalizer in a corpus on Politics and compared the results obtained with those of another normalizer.
