Utilizando um dicionário morfológico para expandir a cobertura lexical de uma gramática do português no formalismo HPSG
Abstract
The broad lexical coverage is one of the prerequisites for the robustness of computational grammar. We propose a methodology to populate the irregular verb forms of PorGram (a Portuguese grammar in the HPSG formalism) using data from MorphoBr. We implemented an algorithm that classifies the verb forms of MorphoBr into regular and irregular, applying the inflectional rules of PorGram. We evaluated the algorithm based on a sample of 38 verbs, both regular and irregular, obtaining the expected results. An additional contribution of the work was the improvement of MorphoBr, with the elimination of more than 270,000 wrong entries and the addition of almost 13,000 missing entries.References
Beesley, K. R. and Karttunen, L. (2003). Finite state morphology. CSLI, Stanford, California.
Bender, E. M., Drellishak, S., Fokkens, A., Poulson, L., and Saleem, S. (2010). Grammar customization. Research on Language & Computation, 8(1):23–72. 10.1007/s11168010-9070-1.
Branco, A. e. F. C. (2014). A computational grammar for deep linguistic processing of Portuguese: LXGram (version 5). Technical report, Universidade de Lisboa, Departamento de Informática.
Copestake, A. (2002). Implementing typed feature structure grammars. CSLI, Stanford, California.
Costa, F. and Branco, A. (2010). LXGram: A deep linguistic processing grammar for Portuguese. In Pardo, T. A. S., Branco, A., Klautau, A., Vieira, R., and de Lima, V. L. S., editors, Computational Processing of the Portuguese Language, pages 86–89, Berlin, Heidelberg. Springer Berlin Heidelberg.
Cunha, C., Cintra, L. F. L., et al. (1985). Nova gramática do português contemporâneo. Nova Fronteira Rio de Janeiro.
de Alencar, L. F., Cuconato, B., and Rademaker, A. (2018). MorphoBr: An open source large-coverage full-form lexicon for morphological analysis of Portuguese. Texto Livre: Linguagem e Tecnologia, 11(3):1–25.
Eleutério, S., Freire, H., Ranchhod, E., and Baptista, J. (1995). A system of electronic dictionaries of Portuguese. Lingvisticae Investigationes, 19(1):57–82.
Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., Lally, A., Murdock, J. W., Nyberg, E., Prager, J., Schlaefer, N., and Welty, C. (2010). Building Watson: An overview of the DeepQA project. AI Magazine, 31(3):59–79.
Flickinger, D. (2000). On building a more effcient grammar by exploiting types. Natural Language Engineering, 6(1):15–28.
Goodman, M. W. (2013). Generation of machine-readable morphological rules with human readable input. University of Washington Working Papers in Linguistics, 30:1–34.
McCord, M. C., Murdock, J. W., and Boguraev, B. K. (2012). Deep parsing in Watson. IBM Journal of Research and Development, 56(3.4):3:1–3:15.
Muniz, M. C. M. (2004). A construção de recursos linguístico-computacionais para o português do brasil: o projeto de unitex-pb. Master’s thesis, Instituto de Ciências Matemáticas e de Computação, USP.
Pollard, C. (1994). Head-driven Phrase Structure Grammar. CSLI, Chicago, Illinois.
Sag, I. A., Wasow, T., and Bender, E. M. (2003). Syntactic theory: A formal introduction. University of Chicago Press, Chicago, second edition edition.
Silva, H. L. B. (2019). Expansão do MorphoBr através da modelagem computacional de processos de formação de palavras em português. Master’s thesis, Universidade Federal do Ceará, Brazil.
Bender, E. M., Drellishak, S., Fokkens, A., Poulson, L., and Saleem, S. (2010). Grammar customization. Research on Language & Computation, 8(1):23–72. 10.1007/s11168010-9070-1.
Branco, A. e. F. C. (2014). A computational grammar for deep linguistic processing of Portuguese: LXGram (version 5). Technical report, Universidade de Lisboa, Departamento de Informática.
Copestake, A. (2002). Implementing typed feature structure grammars. CSLI, Stanford, California.
Costa, F. and Branco, A. (2010). LXGram: A deep linguistic processing grammar for Portuguese. In Pardo, T. A. S., Branco, A., Klautau, A., Vieira, R., and de Lima, V. L. S., editors, Computational Processing of the Portuguese Language, pages 86–89, Berlin, Heidelberg. Springer Berlin Heidelberg.
Cunha, C., Cintra, L. F. L., et al. (1985). Nova gramática do português contemporâneo. Nova Fronteira Rio de Janeiro.
de Alencar, L. F., Cuconato, B., and Rademaker, A. (2018). MorphoBr: An open source large-coverage full-form lexicon for morphological analysis of Portuguese. Texto Livre: Linguagem e Tecnologia, 11(3):1–25.
Eleutério, S., Freire, H., Ranchhod, E., and Baptista, J. (1995). A system of electronic dictionaries of Portuguese. Lingvisticae Investigationes, 19(1):57–82.
Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., Lally, A., Murdock, J. W., Nyberg, E., Prager, J., Schlaefer, N., and Welty, C. (2010). Building Watson: An overview of the DeepQA project. AI Magazine, 31(3):59–79.
Flickinger, D. (2000). On building a more effcient grammar by exploiting types. Natural Language Engineering, 6(1):15–28.
Goodman, M. W. (2013). Generation of machine-readable morphological rules with human readable input. University of Washington Working Papers in Linguistics, 30:1–34.
McCord, M. C., Murdock, J. W., and Boguraev, B. K. (2012). Deep parsing in Watson. IBM Journal of Research and Development, 56(3.4):3:1–3:15.
Muniz, M. C. M. (2004). A construção de recursos linguístico-computacionais para o português do brasil: o projeto de unitex-pb. Master’s thesis, Instituto de Ciências Matemáticas e de Computação, USP.
Pollard, C. (1994). Head-driven Phrase Structure Grammar. CSLI, Chicago, Illinois.
Sag, I. A., Wasow, T., and Bender, E. M. (2003). Syntactic theory: A formal introduction. University of Chicago Press, Chicago, second edition edition.
Silva, H. L. B. (2019). Expansão do MorphoBr através da modelagem computacional de processos de formação de palavras em português. Master’s thesis, Universidade Federal do Ceará, Brazil.
Published
2021-11-29
How to Cite
NUNES, Ana Luiza; RADEMAKER, Alexandre; ALENCAR, Leonel Figueiredo de.
Utilizando um dicionário morfológico para expandir a cobertura lexical de uma gramática do português no formalismo HPSG. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 13. , 2021, Evento Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 11-18.
DOI: https://doi.org/10.5753/stil.2021.17779.
