Lexical noun phrase chunking with Universal Dependencies for Portuguese

Resumo


Partial parsing retrieves a limited amount of syntactic information from a sentence. This project describes the identification of a specific type of noun phrase, through partial syntactic analysis, defined as a lexical noun phrase (NPL), in texts written in Brazilian Portuguese, and annotated according to the Universal Dependency (UD) formalism. The Transformation Based Learning algorithm, TBL–Brill, applied as baseline, obtained an accuracy of 87.42% considering the UD dependency relations and 91.44% considering the UD morphosyntactic tags. Two other classifiers, one based on binary trees and the other based on a decision forest, had inferior performance.

Palavras-chave: Lexical noun phrase, shallow parsing, Universal Dependencies

Referências

Afonso, S., Bick, E., Haber, R., and Santos, D. (2002). Floresta sintá(c)tica: A tree-bank for Portuguese. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), pages 1698–1703, Las Palmas, Spain.

Brill, E. (1995). “Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Comput. Linguist.”, 21(4):543–565.

Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16, page 785–794, New York, NY, USA. Association for Computing Machinery.

Chomsky, N. (2009). Syntactic structures. De Gruyter Mouton.

da Silva, J. R. M. F. (2007). Shallow processing of Portuguese: From sentence chunking to nominal lemmatization. PhD thesis, Universidade de Lisboa, Faculdade de Ciências.

Elhadad, M. (1996). Lexical choice for complex noun phrases: Structure, modifiers, and determiners. Machine Translation, 11:159–184.

Garrido Alenda, A., Gilabert Zarco, P., Pérez-Ortiz, J. A., Pertusa, A., Ramírez Sánchez, G., Sánchez-Martínez, F., Scalco, M. A., and Forcada, M. L. (2004). Shallow parsing for Portuguese-Spanish machine translation. In Workshop Notes of TASHA’2003, pages 21–24, Lisboa, Portugal. Edições Colibri.

Hammerton, J., Osborne, M., Armstrong, S., and Daelemans, W. (2002). Introduction to Special Issue on Machine Learning Approaches to Shallow Parsing. Journal of Machine Learning Research, 2:551–558.

Hjelmslev, L. (1975). Prolegômenos a uma teoria da linguagem. Perspectiva.

Jurafsky, D. and Martin, J. (2021). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, volume 3. Stanford Edu.

Lacroix, O. (2018). Investigating NP-Chunking with Universal Dependencies for English. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 85–90, Brussels, Belgium. Association for Computational Linguistics.

Marneffe, M.-C., Manning, C. D., Nivre, J., and Zeman, D. (2021). Universal Dependencies. Computational Linguistics, 47(2):255–308.

Oliveira, C. and Freitas, M. C. (2006). Um modelo de sintagma nominal lexical na recuperação de informações. XI Simpósio Nacional e I Simpósio Internacional de Letras e Linguística (XI SILEL), pages 778–786.

Pagani, L. A. (2015). Duas Noções Fundamentais para Gramáticas de Dependência.

Pardo, T., Duran, M., Lopes, L., Felippo, A., Roman, N., and Nunes, M. (2021). Porttinari – a Large Multi-genre Treebank for Brazilian Portuguese. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 1–10, Porto Alegre, RS, Brasil. SBC.

Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., and de Paiva, V. (2017). Universal Dependencies for Portuguese. In Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), pages 197–206, Pisa, Italy. Linköping University Electronic Press.

Rambow, O. (2010). The Simple Truth about Dependency and Phrase Structure Representations: An Opinion Piece. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 337–340, Los Angeles, California. Association for Computational Linguistics.

Ramshaw, L. and Marcus, M. (2002). Text Chunking Using Transformation-Based Learning. Third ACL Workshop on Very Large Corpora. MIT, pages 157–176.

Ramshaw, L. A. and Marcus, M. P. (1999). Text chunking using transformation-based learning. Natural language processing using very large corpora, pages 157–176.

Santos, D. S. M. (2021). Grandes quantidades de informação: um olhar crítico. In II Congresso Internacional em Humanidades Digitais, Online. UFRJ.

Sharma, A., Gupta, S., Motlani, R., Bansal, P., Shrivastava, M., Mamidi, R., and Sharma, D. M. (2016). Shallow Parsing Pipeline – Hindi-English Code-Mixed Social Media Text. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1340 – 1345, San Diego, California. Association for Computational Linguistics.

Silva, M. C. and Koch, I. G. (2012). Linguística aplicada ao português. Cortez.

Souza, A. and Ruiz, E. E. S. (2022). Investigating Lexical NP-Chunking with Universal Dependencies for Portuguese. In Anais do XIX Encontro Nacional de Dependencies for Portuguese. In Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional, pages 342–351, Porto Alegre, RS, Brasil. SBC.

Tesnière, L. (2015). Elements of structural syntax. John Benjamins Publishing Company.

Tham, M. J. (2020). Bidirectional Gated Recurrent Unit For Shallow Parsing. Indian Journal of Computer Science and Engineering (IJCSE), 11(5):517–521.

Tjong Kim Sang, E. F. and Buchholz, S. (2000). Introduction to the CoNLL-2000 Shared Task Chunking. In Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop.

Topsakal, O., Açikgoz, O., Gürkan, A. T., Kanburoglu, A. B., Ertopçu, B., Özenç, B., Çam, I., Avar, B., Ercan, G., and Yildiz, O. T. (2017). Shallow parsing in Turkish. In 2017 International Conference on Computer Science and Engineering (UBMK), pages 480–485.

Uneson, M. (2014). When Errors Become the Rule: Twenty Years with Transformation-Based Learning. ACM Comput. Surv., 46(4).
Publicado
25/09/2023
DE SOUZA, Aleksander Tomaz; RUIZ, Evandro Eduardo Seron. Lexical noun phrase chunking with Universal Dependencies for Portuguese. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 14. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 414-423. DOI: https://doi.org/10.5753/stil.2023.25482.