The Obstacle of Structural Ambiguities in Language for Linguistically Motivated Language Models

  • João Pedro Gonçalves Munhoz UFSCar
  • Oto Araújo Vale UFSCar

Abstract


This article presents UDCode, a convention for encoding morphosyntactic and dependency information from Universal Dependencies. The objective is to evaluate how the granularity of UD annotations for Brazilian Portuguese impacts the recognition of named entities of time. Exploratory tests revealed that the underspecification in adverb categorization compromises precision, generating a high rate of false positives. The result shows that the effectiveness of a linguistically motivated model depends on the level of detail in the annotations. It is concluded that future work should focus on revising the annotation guidelines to include more refined adverbial categories or on methods that compensate for this lack of specificity.

References

Barros, C. D. and Vale, O. A. (2024). Roda viva: um corpus oral e a universal dependencies. In Anais Eletrônicos do XVI Encontro de Linguística de Corpus e da XII Escola Brasileira de Linguística Computacional, volume 1, pages 89–94, Brasília.

Blackwell, R. E., Barry, J., and Cohn, A. G. (2024). Towards reproducible llm evaluation: Quantifying uncertainty in llm benchmark scores. arXiv preprint arXiv:2410.03492.

Hillier, D., Guertler, L., Tan, C., Agrawal, P., Ruirui, C., and Cheng, B. (2024). Super tiny language models. arXiv preprint arXiv:2405.14159.

Hu, Y., Ameer, I., Zuo, X., Peng, X., Zhou, Y., Li, Z., Li, Y., Li, J., Jiang, X., and Xu, H. (2023). Zero-shot clinical entity recognition using chatgpt. arXiv preprint arXiv:2303.16416.

Ilari, R., de Castilho, A. T., and Gnerre, M. B. M. (2014). Gramática do português culto falado no Brasil: Palavras de classe aberta.

Kamp, H. and Reyle, U. (1993). From discourse to logic. Studies in Linguistics and Philosophy. Springer, Dordrecht, Netherlands, 1993 edition.

Liao, Q. V. and Vaughan, J. W. (2023). Ai transparency in the age of llms: A humancentered research roadmap. arXiv preprint arXiv:2306.01941, 10.

Lopes, L. (2024). portTokenizer. [link].

Lopes, L. and Pardo, T. (2024). Towards portparser a highly accurate parsing system for Brazilian Portuguese following the Universal Dependencies framework. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese Vol. 1, pages 401–410, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.

Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., and Gómez-Berbís, J. M. (2013). Named entity recognition: fallacies, challenges and opportunities. Computer Standards & Interfaces, 35(5):482–489.

Mota, C. and Santos, D., editors (2008). Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca.

Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C. D., Pyysalo, S., Schuster, S., Tyers, F., and Zeman, D. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043, Marseille, France. European Language Resources Association.

Panigutti, C., Hamon, R., Hupont, I., Fernandez Llorca, D., Fano Yela, D., Junklewitz, H., Scalzo, S., Mazzini, G., Sanchez, I., Soler Garrido, J., and Gomez, E. (2023). The role of explainable ai in the context of the ai act. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 1139–1150, New York, NY, USA. Association for Computing Machinery.

Rai, A. (2020). Explainable ai: From black box to glass box. Journal of the academy of marketing science, 48:137–141.

Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: Pretrained BERT models for brazilian portuguese. In Intelligent Systems, Lecture notes in computer science, pages 403–417. Springer International Publishing, Cham.

Tsitseklis, K., Stavropoulou, G., and Papavassiliou, S. (2024). Custom named entity recognition vs chatgpt prompting: a paleontology experiment. In 2024 Panhellenic Conference on Electronics & Telecommunications (PACET), pages 1–5. IEEE.

Universal Dependencies contributors (2025a). CoNLL-U Format. Universal Dependencies.

Universal Dependencies contributors (2025b). UD Portuguese Bosque. Universal Dependencies.

Zhong, X. and Cambria, E. (2021). Literature review. In Socio-Affective Computing, pages 15–34. Springer International Publishing, Cham.
Published
2025-09-29
MUNHOZ, João Pedro Gonçalves; VALE, Oto Araújo. The Obstacle of Structural Ambiguities in Language for Linguistically Motivated Language Models. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 645-653. DOI: https://doi.org/10.5753/stil.2025.37867.