Yauti: A Tool for Morphosyntactic Analysis of Nheengatu within the Universal Dependencies Framework


This paper reports on Yauti, a rule-based morphosyntactic analyzer for the endangered Brazilian indigenous language Nheengatu. Its goal is to generate analyses in the UD framework’s CoNLL-U format. It has been developed on par with the construction of the Nheengatu treebank of the UD collection. In sentences only consisting of known and unambiguous words, the tool generally delivers good results. It obtained a LAS score of 73.2% in a version of the Nheengatu UD treebank with all 1022 sentences automatically provided with XPOS tags and a special annotation to handle non-lexicalized words.

Palavras-chave: Universal Dependencies, Treebank, Corpus Annotation, Dependency Parsing, Morphological Generator, Syntactic Parsing, Morphological Parsing, Automatic Morphosyntactic Analysis, Part-of-Speech Tagging, Low-resource Language, Nheengatu, Tupian


