Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

Nathan S. Hartmann; Erick R. Fonseca; Christopher D. Shulby; Marcos V. Treviso; Jéssica S. Rodrigues; Sandra M. Aluísio

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

Nathan S. Hartmann USP
Erick R. Fonseca USP
Christopher D. Shulby USP
Marcos V. Treviso USP
Jéssica S. Rodrigues UFSCar
Sandra M. Aluísio USP

Resumo

Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing systems. In this paper, we evaluated different word embedding models trained on a large Portuguese corpus, including both Brazilian and European variants. We trained 31 word embedding models using FastText, GloVe, Wang2Vec and Word2Vec. We evaluated them intrinsically on syntactic and semantic analogies and extrinsically on POS tagging and sentence semantic similarity tasks. The obtained results suggest that word analogies are not appropriate for word embedding evaluation instead task-specific evaluations may be a better option; Wang2Vec appears to be a robust model; the increase in performance in our evaluations with bigger models is not worth the increase in memory usage for models with more than 300 dimensions.

PDF (English)

Publicado

02/10/2017

Como Citar

Selecione um Formato

HARTMANN, Nathan S.; FONSECA, Erick R.; SHULBY, Christopher D.; TREVISO, Marcos V.; RODRIGUES, Jéssica S.; ALUÍSIO, Sandra M.. Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 1. , 2017, Uberlândia/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 122-131.