Tesauros Distribucionais para o Português: avaliação de metodologias

  • Rodrigo Wilkens UFRGS
  • Leonardo Zilio UFRGS
  • Eduardo Ferreira UFRGS
  • Gabriel Gonçalves UFRGS
  • Aline Villavicencio UFRGS

Abstract


In recent decades there has been an increase in interest on methods for the automatic construction of distributional thesauri from corpora. Efforts to systematically evaluate and improve the resulting thesauri have been made for languages like English and French, but for Portuguese there is an urgent need for such initiatives. This paper presents a comparative investigation of the two main approaches for thesaurus generation: count-based and predictive methods, focusing on Portuguese. For the evaluation we propose a TOEFL-like test for Portuguese which was automatically generated from BabelNet, using nouns and verbs.

Published
2015-11-04
WILKENS, Rodrigo; ZILIO, Leonardo; FERREIRA, Eduardo; GONÇALVES, Gabriel; VILLAVICENCIO, Aline. Tesauros Distribucionais para o Português: avaliação de metodologias. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 1. , 2015, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2015 . p. 131-140.