Syntactic parsing: where are we going?
Resumo
In this review & opinion paper, we discuss the options and challenges for syntactic parsing. Despite significant advances in recent years, driven primarily by neural network architectures, parsing accuracy appears to be approaching a plateau. This paper proposes a reflection on the factors that may possibly be influencing such results and suggests some future paths.
Palavras-chave:
Tools and Resources for NLP, Syntactic representations, Parsing
Referências
Alves, D., Bekavac, B., and Tadić, M. (2021). Typological approach to improve dependency parsing for Croatian language. In Dakota, D., Evang, K., and Kübler, S., editors, Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest), pages 1–11, Sofia, Bulgaria. Association for Computational Linguistics. [link]
Attardi, G., Sartiano, D., and Simi, M. (2021). Biaffine dependency and semantic graph parsing for Enhanced Universal dependencies. In Oepen, S., Sagae, K., Tsarfaty, R., Bouma, G., Seddah, D., and Zeman, D., editors, Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies, pages 184–188, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.iwpt-1.19
Baig, A., Rahman, M. U., Shah, A. S., and Abbasi, S. (2021). Universal dependencies for urdu noisy text. International Journal of Advanced Trends in Computer Science and Engineering. DOI: 10.30534/ijatcse/2021/371032021
Branco, A., Silva, J. R., Gomes, L., and António Rodrigues, J. (2022). Universal grammatical dependencies for Portuguese with CINTIL data, LX processing and CLARIN support. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC), pages 5617–5626, Marseille, France. European Language Resources Association. [link]
Brigada Villa, L. and Giarda, M. (2023). Using modern languages to parse ancient ones: a test on Old English. In Beinborn, L., Goswami, K., Muradolu, S., Sorokin, A., Kumar, R., Shcherbakov, A., Ponti, E. M., Cotterell, R., and Vylomova, E., editors, Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 30–41, Dubrovnik, Croatia. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.sigtyp-1.4
Cassidy, L., Lynn, T., Barry, J., and Foster, J. (2022). TwittIrish: A Universal Dependencies treebank of tweets in Modern Irish. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6869–6884, Dublin, Ireland. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2022.acl-long.473
de Lhoneux, M., Stymne, S., and Nivre, J. (2017). Arc-hybrid non-projective dependency parsing with a static-dynamic oracle. In Miyao, Y. and Sagae, K., editors, Proceedings of the 15th International Conference on Parsing Technologies, pages 99–104, Pisa, Italy. Association for Computational Linguistics. [link]
de Marneffe, M.-C., Manning, C. D., Nivre, J., and Zeman, D. (2021). Universal Dependencies. Computational Linguistics, 47(2):255–308. [link] DOI: 10.1162/coli_a_00402
Dione, C. M. B. (2021). Multilingual dependency parsing for low-resource African languages: Case studies on Bambara, Wolof, and Yoruba. In Oepen, S., Sagae, K., Tsarfaty, R., Bouma, G., Seddah, D., and Zeman, D., editors, Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies, pages 84–92, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.iwpt-1.9
Dozat, T. and Manning, C. D. (2016). Deep biaffine attention for neural dependency parsing. CoRR, abs/1611.01734. [link]
Duran, M., das Graças Nunes, M., and Pardo, T. A. (2023a). Construções sintáticas do português que desafiam a tarefa de parsing: uma análise qualitativa. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 424–433, Porto Alegre, RS, Brasil. SBC. [link] DOI: 10.5753/stil.2023.25483
Duran, M. S., Nunes, M. d. G. V., and Pardo, T. A. S. (2023b). Avaliação qualitativa do analisador sintático udpipe 2 treinado sobre o córpus jornalístico porttinari-base. Technical report, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo.
Fernández-González, D. and Gómez-Rodríguez, C. (2023). Dependency parsing with bottom-up hierarchical pointer networks. Information Fusion, 91:494–503. [link] DOI: 10.1016/j.inffus.2022.10.023
Gamba, F. and Zeman, D. (2023). Universalising Latin Universal Dependencies: a harmonisation of Latin treebanks in UD. In Grobol, L. and Tyers, F., editors, Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest), pages 7–16, Washington, D.C. Association for Computational Linguistics. [link]
Ghiffari, F. A. A., Alfina, I., and Azizah, K. (2023). Cross-lingual transfer learning for Javanese dependency parsing. In Li, D., Mahendra, R., Tang, Z. P., Jang, H., Murawaki, Y., and Wong, D. F., editors, Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 1–9, Nusa Dua, Bali. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.ijcnlp-srw.1
Goldberg, Y. (2016). A primer on neural network models for natural language processing. J. Artif. Int. Res., 57(1):345–420. [link]
Kabiri, R., Karimi, S., and Surdeanu, M. (2022). Informal Persian Universal Dependency treebank. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC), pages 7096–7105, Marseille, France. European Language Resources Association. [link]
Kondratyuk, D. and Straka, M. (2019). 75 languages, 1 model: Parsing Universal Dependencies universally. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2779–2795, Hong Kong, China. Association for Computational Linguistics. [link] DOI: 10.18653/v1/D19-1279
Lopes, L. and Pardo, T. (2024). Towards portparser - a highly accurate parsing system for Brazilian Portuguese following the Universal Dependencies framework. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 401–410, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics. [link]
Lusito, S. and Maillard, J. (2021). A Universal Dependencies corpus for Ligurian. In de Lhoneux, M. and Tsarfaty, R., editors, Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest), pages 121–128, Sofia, Bulgaria. Association for Computational Linguistics. [link]
Mrini, K., Dernoncourt, F., Bui, T., Chang, W., and Nakashole, N. (2019). Rethinking self-attention: An interpretable self-attentive encoder-decoder parser. CoRR, abs/1911.03875. [link]
Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., and Zeman, D. (2016). Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC), pages 1659–1666, Portoroz, Slovenia. ELRA. [link]
Pedrazzini, N. and Eckhoff, H. M. (2021). Oldslavnet: A scalable early slavic dependency parser trained on modern language data. Software Impacts, 8:100063. [link] DOI: 10.1016/j.simpa.2021.100063
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C. D. (2020). Stanza: A python natural language processing toolkit for many human languages. In Celikyilmaz, A. and Wen, T.-H., editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 101–108, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2020.acl-demos.14
Sánchez-Rodríguez, X., Sarymsakova, A., Castro, L., and Garcia, M. (2024). Increasing manually annotated resources for Galician: the parallel Universal Dependencies treebank. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 587–592, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics. [link]
Straka, M. (2018). UDPipe 2.0 prototype at CoNLL 2018 UD shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 197–207. [link]
Straka, M., Hajič, J., and Straková, J. (2016). UDPipe: Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC), pages 4290–4297, Portorǒz, Slovenia. European Language Resources Association (ELRA). [link]
Türk, U., Atmaca, F., Özates¸, c. B., Berk, G., Bedir, S. T., Köksal, A., Başaran, B. O., Güngör, T., and Özgür, A. (2022). Resources for turkish dependency parsing: introducing the boun treebank and the boat annotation tool. Lang. Resour. Eval., 56(1):259–307. DOI: 10.1007/s10579-021-09558-0
Ustün, A., Bisazza, A., Bouma, G., and van Noord, G. (2020). UDapter: Language adaptation for truly Universal Dependency parsing. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2302–2315, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2020.emnlp-main.180
Van Houdt, G., Mosquera, C., and Nápoles, G. (2020). A review on the long short-term memory model. Artificial Intelligence Review, 53. DOI: 10.1007/s10462-020-09838-1
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc. [link]
Yshaayahu Levi, D. and Tsarfaty, R. (2024). A truly joint neural architecture for segmentation and parsing. In Graham, Y. and Purver, M., editors, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1408–1420, St. Julian’s, Malta. Association for Computational Linguistics. [link]
Attardi, G., Sartiano, D., and Simi, M. (2021). Biaffine dependency and semantic graph parsing for Enhanced Universal dependencies. In Oepen, S., Sagae, K., Tsarfaty, R., Bouma, G., Seddah, D., and Zeman, D., editors, Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies, pages 184–188, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.iwpt-1.19
Baig, A., Rahman, M. U., Shah, A. S., and Abbasi, S. (2021). Universal dependencies for urdu noisy text. International Journal of Advanced Trends in Computer Science and Engineering. DOI: 10.30534/ijatcse/2021/371032021
Branco, A., Silva, J. R., Gomes, L., and António Rodrigues, J. (2022). Universal grammatical dependencies for Portuguese with CINTIL data, LX processing and CLARIN support. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC), pages 5617–5626, Marseille, France. European Language Resources Association. [link]
Brigada Villa, L. and Giarda, M. (2023). Using modern languages to parse ancient ones: a test on Old English. In Beinborn, L., Goswami, K., Muradolu, S., Sorokin, A., Kumar, R., Shcherbakov, A., Ponti, E. M., Cotterell, R., and Vylomova, E., editors, Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 30–41, Dubrovnik, Croatia. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.sigtyp-1.4
Cassidy, L., Lynn, T., Barry, J., and Foster, J. (2022). TwittIrish: A Universal Dependencies treebank of tweets in Modern Irish. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6869–6884, Dublin, Ireland. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2022.acl-long.473
de Lhoneux, M., Stymne, S., and Nivre, J. (2017). Arc-hybrid non-projective dependency parsing with a static-dynamic oracle. In Miyao, Y. and Sagae, K., editors, Proceedings of the 15th International Conference on Parsing Technologies, pages 99–104, Pisa, Italy. Association for Computational Linguistics. [link]
de Marneffe, M.-C., Manning, C. D., Nivre, J., and Zeman, D. (2021). Universal Dependencies. Computational Linguistics, 47(2):255–308. [link] DOI: 10.1162/coli_a_00402
Dione, C. M. B. (2021). Multilingual dependency parsing for low-resource African languages: Case studies on Bambara, Wolof, and Yoruba. In Oepen, S., Sagae, K., Tsarfaty, R., Bouma, G., Seddah, D., and Zeman, D., editors, Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies, pages 84–92, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.iwpt-1.9
Dozat, T. and Manning, C. D. (2016). Deep biaffine attention for neural dependency parsing. CoRR, abs/1611.01734. [link]
Duran, M., das Graças Nunes, M., and Pardo, T. A. (2023a). Construções sintáticas do português que desafiam a tarefa de parsing: uma análise qualitativa. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 424–433, Porto Alegre, RS, Brasil. SBC. [link] DOI: 10.5753/stil.2023.25483
Duran, M. S., Nunes, M. d. G. V., and Pardo, T. A. S. (2023b). Avaliação qualitativa do analisador sintático udpipe 2 treinado sobre o córpus jornalístico porttinari-base. Technical report, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo.
Fernández-González, D. and Gómez-Rodríguez, C. (2023). Dependency parsing with bottom-up hierarchical pointer networks. Information Fusion, 91:494–503. [link] DOI: 10.1016/j.inffus.2022.10.023
Gamba, F. and Zeman, D. (2023). Universalising Latin Universal Dependencies: a harmonisation of Latin treebanks in UD. In Grobol, L. and Tyers, F., editors, Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest), pages 7–16, Washington, D.C. Association for Computational Linguistics. [link]
Ghiffari, F. A. A., Alfina, I., and Azizah, K. (2023). Cross-lingual transfer learning for Javanese dependency parsing. In Li, D., Mahendra, R., Tang, Z. P., Jang, H., Murawaki, Y., and Wong, D. F., editors, Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 1–9, Nusa Dua, Bali. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.ijcnlp-srw.1
Goldberg, Y. (2016). A primer on neural network models for natural language processing. J. Artif. Int. Res., 57(1):345–420. [link]
Kabiri, R., Karimi, S., and Surdeanu, M. (2022). Informal Persian Universal Dependency treebank. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC), pages 7096–7105, Marseille, France. European Language Resources Association. [link]
Kondratyuk, D. and Straka, M. (2019). 75 languages, 1 model: Parsing Universal Dependencies universally. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2779–2795, Hong Kong, China. Association for Computational Linguistics. [link] DOI: 10.18653/v1/D19-1279
Lopes, L. and Pardo, T. (2024). Towards portparser - a highly accurate parsing system for Brazilian Portuguese following the Universal Dependencies framework. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 401–410, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics. [link]
Lusito, S. and Maillard, J. (2021). A Universal Dependencies corpus for Ligurian. In de Lhoneux, M. and Tsarfaty, R., editors, Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest), pages 121–128, Sofia, Bulgaria. Association for Computational Linguistics. [link]
Mrini, K., Dernoncourt, F., Bui, T., Chang, W., and Nakashole, N. (2019). Rethinking self-attention: An interpretable self-attentive encoder-decoder parser. CoRR, abs/1911.03875. [link]
Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., and Zeman, D. (2016). Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC), pages 1659–1666, Portoroz, Slovenia. ELRA. [link]
Pedrazzini, N. and Eckhoff, H. M. (2021). Oldslavnet: A scalable early slavic dependency parser trained on modern language data. Software Impacts, 8:100063. [link] DOI: 10.1016/j.simpa.2021.100063
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C. D. (2020). Stanza: A python natural language processing toolkit for many human languages. In Celikyilmaz, A. and Wen, T.-H., editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 101–108, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2020.acl-demos.14
Sánchez-Rodríguez, X., Sarymsakova, A., Castro, L., and Garcia, M. (2024). Increasing manually annotated resources for Galician: the parallel Universal Dependencies treebank. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 587–592, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics. [link]
Straka, M. (2018). UDPipe 2.0 prototype at CoNLL 2018 UD shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 197–207. [link]
Straka, M., Hajič, J., and Straková, J. (2016). UDPipe: Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC), pages 4290–4297, Portorǒz, Slovenia. European Language Resources Association (ELRA). [link]
Türk, U., Atmaca, F., Özates¸, c. B., Berk, G., Bedir, S. T., Köksal, A., Başaran, B. O., Güngör, T., and Özgür, A. (2022). Resources for turkish dependency parsing: introducing the boun treebank and the boat annotation tool. Lang. Resour. Eval., 56(1):259–307. DOI: 10.1007/s10579-021-09558-0
Ustün, A., Bisazza, A., Bouma, G., and van Noord, G. (2020). UDapter: Language adaptation for truly Universal Dependency parsing. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2302–2315, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2020.emnlp-main.180
Van Houdt, G., Mosquera, C., and Nápoles, G. (2020). A review on the long short-term memory model. Artificial Intelligence Review, 53. DOI: 10.1007/s10462-020-09838-1
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc. [link]
Yshaayahu Levi, D. and Tsarfaty, R. (2024). A truly joint neural architecture for segmentation and parsing. In Graham, Y. and Purver, M., editors, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1408–1420, St. Julian’s, Malta. Association for Computational Linguistics. [link]
Publicado
17/11/2024
Como Citar
LOPES, Lucelene; PARDO, Thiago Alexandre Salgueiro; DURAN, Magali S..
Syntactic parsing: where are we going?. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 15. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 67-74.
DOI: https://doi.org/10.5753/stil.2024.245043.