Automatic Annotation of Enhanced Universal Dependencies for Brazilian Portuguese

Resumo


This paper presents the first attempt to automatically annotate Enhanced Universal Dependencies for Brazilian Portuguese. We use a symbolic annotation system, based on graph rewriting rules, and modify its original rules to better suit the linguistic characteristics of Portuguese using a manually annotated sample from the journalistic portion of Porttinari treebank as ground truth. Our objective is to assess the performance of the automatic annotation for a novel language and to determine the extent of possible improvements through rule modifications. Results demonstrate significant performance enhancements, where linguistic-driven rule adjustments improved the annotation accuracy 11.38 points, achieving 96.05% F1-score.

Palavras-chave: Enhanced Dependencies, Universal Dependencies, syntactic annotation, corpus annotation, graph rewriting

Referências

Bai, J., Wang, Y., Chen, Y., Yang, Y., Bai, J., Yu, J., and Tong, Y. (2021). Syntax-BERT: Improving pre-trained transformers with syntax trees. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3011–3020.

Bölücü, N., Rybinski, M., and Wan, S. (2023). Investigating the impact of syntax-enriched transformers on quantity extraction in scientific texts. In Proceedings of the Second Workshop on Information Extraction from Scientific Publications, pages 1–13, Bali, Indonesia.

Bouma, G., Seddah, D., and Zeman, D. (2020). Overview of the iwpt 2020 shared task on parsing into enhanced universal dependencies. In 58th Annual Meeting of the Association for Computational Linguistics.

Bouma, G., Seddah, D., and Zeman, D. (2021). From raw text to enhanced universal dependencies: The parsing shared task at iwpt 2021. In Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021), pages 146–157.

Candido, A., Maziero, E., Specia, L., Gasperin, C., Pardo, T., and Aluisio, S. (2009). Supporting the adaptation of texts for poor literacy readers: a text simplification editor for Brazilian Portuguese. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications, pages 34–42, Boulder, Colorado.

De Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., and Manning, C. D. (2014). Universal stanford dependencies: A cross-linguistic typology. In LREC, volume 14, pages 4585–4592.

Duran, M., Lopes, L., Nunes, M. G. V., and Pardo, T. (2023). The dawn of the porttinari multigenre treebank: Introducing its journalistic portion. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 115–124, Porto Alegre, RS, Brasil. SBC.

Duran, M. S. (2024). Anotação de enhanced dependencies. Disponível em: [link]. Acesso em: 10 out. 2024.

Guillaume, B. and Perrier, G. (2021). Graph rewriting for enhanced universal dependencies. In IWPT 2021-17th International Conference on Parsing Technologies.

Lin, Y., Wang, C., Song, H., and Li, Y. (2021). Multi-head self-attention transformation networks for aspect-based sentiment analysis. IEEE Access, 9:8762– 8770.

Nivre, J., De Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., and Silveira, N. (2016). Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 1659–1666.

Nivre, J., de Marneffe, M.-C., Ginter, F., Hajic, J., Manning, C. D., Pyysalo, S., Schuster, S., Tyers, F., and Zeman, D. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043.

Oliveira, L., Claro, D. B., and Souza, M. (2023). Dptoie: a portuguese open information extraction based on dependency analysis. Artificial Intelligence Review, 56(2):7015–7046.

Pagano, A. S., Duran, M. S., and Pardo, T. A. S. (2023). Enhanced dependencies para o português brasileiro. In Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival, pages 461–470.

Schuster, S. and Manning, C. D. (2016). Enhanced english universal dependencies: An improved representation for natural language understanding tasks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 2371–2378.

Shi, T. and Lee, L. (2021). TGIF: Tree-graph integrated-format parser for enhanced UD with two-stage generic- to individual-language finetuning. In Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021), pages 213–224.

Zhou, J., Zhang, Z., Zhao, H., and Zhang, S. (2020). LIMIT-BERT: Linguistics informed multi-task BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4450–4461.
Publicado
17/11/2024
DE SOUZA, Elvis A.; DURAN, Magali S.; NUNES, Maria das Graças V.; SAMPAIO, Gustavo; BELASCO, Giovanna; PARDO, Thiago A. S.. Automatic Annotation of Enhanced Universal Dependencies for Brazilian Portuguese. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 15. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 217-226. DOI: https://doi.org/10.5753/stil.2024.245342.