Normalizador de Texto para Língua Portuguesa baseado em Modelo de Linguagem

  • Patrick Thiago Bard PUCRS
  • Renan Lopes Luis PUCRS
  • Silvia Maria Wanderley Moraes PUCRS

Abstract


Automatic processing of user-generated content on the Internet is a major challenge. Informal writing is one reason for this difficulty. This informality motivated the research on methods for text normalization. Text normalization is a step that precedes the usual processing, converting the text from user into a ’standard’ (more formal) writing format. In this work, we prototype a normalizer for the Portuguese Language that is based on language model. In this approach, we use the machine translation technique to normalize the texts. We tested our normalizer in a corpus on Politics and compared the results obtained with those of another normalizer.

Published
2017-10-02
BARD, Patrick Thiago; LUIS, Renan Lopes; MORAES, Silvia Maria Wanderley. Normalizador de Texto para Língua Portuguesa baseado em Modelo de Linguagem. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 1. , 2017, Uberlândia/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 142-150.