PropBank e anotação de papéis semânticos para a língua portuguesa: O que há de novo?
Resumo
O artigo introduz o Porttinari-base PropBank (PBP): o corpus Porttinari-base com uma camada de papéis semânticos. A anotação foi feita sobre dependências sintáticas, usando regras linguísticas e sob inspeção humana. Foram anotados mais de 40 mil argumentos, e os resultados são discutidos à luz de trabalhos que investigam a generalização das classes do PropBank.
Palavras-chave:
PropBank, anotação semântica, papéis semânticos, Dependências Universais
Referências
Bick, E. (2007). Automatic semantic role annotation for portuguese. In Proceedings of TIL 2007 - 5th Workshop on Information and Human Language Technology, pages 1713–1716, Rio de Janeiro. Sociedade Brasileira de Computação (SBC).
Branco, A., Carvalheiro, C., Pereira, S., Silveira, S., Silva, J., Castro, S., and Graça, J. (2012). A PropBank for Portuguese: the CINTIL-PropBank. In Calzolari, N., Choukri, K., Declerck, T., Doğan, M. U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 1516–1521, Istanbul, Turkey. European Language Resources Association (ELRA).
de Marneffe, M.-C., Manning, C. D., Nivre, J., and Zeman, D. (2021). Universal Dependencies. Computational linguistics, 47(2):255–308.
de Souza, E. and Freitas, C. (2021). ET: A workstation for querying, editing and evaluating annotated corpora. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 35–41, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Dong, X. L. (2023). Generations of knowledge graphs: The crazy ideas and the business impact. Proc. VLDB Endow., 16(12):4130–4137.
Duran, M., Lopes, L., das Graças Nunes, M., and Pardo, T. (2023). The dawn of the porttinari multigenre treebank: Introducing its journalistic portion. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 115–124, Porto Alegre, RS, Brasil. SBC.
Duran, M. S. (2014). Manual de anotação do PropBank-Br v2. Technical report, ICMC-USP.
Duran, M. S. and Aluísio, S. M. (2011). Propbank-br: a Brazilian Portuguese corpus annotated with semantic role labels. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology.
Duran, M. S. and Freitas, C. (2024). Guia de anotação de papéis semânticos seguindo o modelo PropBank no corpus Porttinari-base. (no prelo). Technical report, ICMC-USP.
Duran, M. S., Torres, L. S., Viviani, M. C., Hartmann, N., and Aluísio, S. M. (2014). Seleção e preparação de sentenças do corpus PLN-BR para compor o corpus de anotação de papéis semânticos Propbank-Br.v2. Technical report, Núcleo Interinstitucional de Linguística Computacional.
Evans, R. and Orasan, C. (2019). Sentence simplification for semantic role labelling and information extraction. In Mitkov, R. and Angelova, G., editors, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 285–294, Varna, Bulgaria. INCOMA Ltd.
Freitas, C. (2023). Dataset e corpus. In Caseli, H. and Volpe Nunes, M. d. G., editors, Processamento de Linguagem Natural: conceitos, técnicas e aplicações em Português, pages 1–37. BPLN.
Freitas, C. (2024). Anotação de papéis semânticos no corpus Porttinari-base: Procedimentos, resultados e análise. (no prelo). Technical report, ICMC-USP.
Freitas, C., Souza, E., Castro, M. C., Cavalcanti, T., Ferreira da Silva, P., and Corrêa Cordeiro, F. (2023). Recursos linguísticos para o PLN específico de domínio: o Petrolês. Linguamática, 15(2):51–68.
Gung, J. and Palmer, M. (2021). Predicate representations and polysemy in VerbNet semantic parsing. In Zarrieß, S., Bos, J., van Noord, R., and Abzianidze, L., editors, Proceedings of the 14th International Conference on Computational Semantics (IWCS), pages 51–62, Groningen, The Netherlands (online). Association for Computational Linguistics.
Han, H. and Choi, J. (2020). Establishing strong baselines for the new decade: Sequence tagging, syntactic and semantic parsing with bert. In Proceedings of the Thirty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS 2020).
Hartmann, N. S., Duran, M. S., and Aluísio, S. M. (2016). Automatic semantic role labeling on non-revised syntactic trees of journalistic texts. In Silva, J., Ribeiro, R., Quaresma, P., Adami, A., and Branco, A., editors, Computational Processing of the Portuguese Language, pages 202–212, Cham. Springer International Publishing.
Levin, B. (1993). English Verb Classes and Alternations: a preliminary investigation. The University of Chicago Press, London.
Levin, B. and Rappaport Hovav, M. (2005). Argument Realization. Cambridge University Pres, Cambridge.
Li, T., Kazeminejad, G., Brown, S., Srikumar, V., and Palmer, M. (2023). Learning semantic role labeling from compatible label sequences. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15561–15572, Singapore. Association for Computational Linguistics.
Merlo, P. and Van Der Plas, L. (2009). Abstraction and generalisation in semantic role labels: PropBank, VerbNet or both? In Su, K.-Y., Su, J., Wiebe, J., and Li, H., editors, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 288–296, Suntec, Singapore. Association for Computational Linguistics.
Mohebbi, M., Razavi, S. N., and Balafar, M. A. (2022). Computing semantic similarity of texts based on deep graph learning with ability to use semantic role label information. Scientific Reports, 12(1).
Palmer, M., Gildea, D., and Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational linguistics, 31(1):71–106.
Pardo, T., Duran, M., Lopes, L., Felippo, A., Roman, N., and Nunes, M. (2021). Porttinari - a large multi-genre treebank for brazilian portuguese. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 1–10, Porto Alegre, RS, Brasil. SBC.
Rodrigues, J., Gomes, L., Silva, J., Branco, A., Santos, R., Cardoso, H. L., and Osório, T. (2023). Advancing neural encoding of portuguese with transformer albertina pt-*. In Moniz, N., Vale, Z., Cascalho, J., Silva, C., and Sebastião, R., editors, Progress in Artificial Intelligence, pages 441–453, Cham. Springer Nature Switzerland.
Sanches Duran, M. and Aluísio, S. (2015). Automatic generation of a lexical resource to support semantic role labeling in Portuguese. In Palmer, M., Boleda, G., and Rosso, P., editors, Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, pages 216–221, Denver, Colorado. Association for Computational Linguistics.
Tenney, I., Das, D., and Pavlick, E. (2019a). BERT rediscovers the classical NLP pipeline. In Korhonen, A., Traum, D., and Màrquez, L., editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4593–4601, Florence, Italy. Association for Computational Linguistics.
Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., McCoy, R. T., Kim, N., Durme, B. V., Bowman, S. R., Das, D., and Pavlick, E. (2019b). What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations.
Wallis, S. (2003). Completing parsed corpora: From correction to evolution. In Abeillé, A., editor, Treebanks: Building and Using Parsed Corpora, pages 61–71. Springer Netherlands, Dordrecht.
Wang, N., Li, J., Meng, Y., Sun, X., Qiu, H., Wang, Z., Wang, G., and He, J. (2022). An MRC framework for semantic role labeling. In Calzolari, N., Huang, C.-R., Kim, H., Pustejovsky, J., Wanner, L., Choi, K.-S., Ryu, P.-M., Chen, H.-H., Donatelli, L., Ji, H., Kurohashi, S., Paggio, P., Xue, N., Kim, S., Hahm, Y., He, Z., Lee, T. K., Santus, E., Bond, F., and Na, S.-H., editors, Proceedings of the 29th International Conference on Computational Linguistics, pages 2188–2198, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Yi, S.-t., Loper, E., and Palmer, M. (2007). Can semantic roles generalize across genres? In Sidner, C., Schultz, T., Stone, M., and Zhai, C., editors, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 548–555, Rochester, New York. Association for Computational Linguistics.
Branco, A., Carvalheiro, C., Pereira, S., Silveira, S., Silva, J., Castro, S., and Graça, J. (2012). A PropBank for Portuguese: the CINTIL-PropBank. In Calzolari, N., Choukri, K., Declerck, T., Doğan, M. U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 1516–1521, Istanbul, Turkey. European Language Resources Association (ELRA).
de Marneffe, M.-C., Manning, C. D., Nivre, J., and Zeman, D. (2021). Universal Dependencies. Computational linguistics, 47(2):255–308.
de Souza, E. and Freitas, C. (2021). ET: A workstation for querying, editing and evaluating annotated corpora. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 35–41, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Dong, X. L. (2023). Generations of knowledge graphs: The crazy ideas and the business impact. Proc. VLDB Endow., 16(12):4130–4137.
Duran, M., Lopes, L., das Graças Nunes, M., and Pardo, T. (2023). The dawn of the porttinari multigenre treebank: Introducing its journalistic portion. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 115–124, Porto Alegre, RS, Brasil. SBC.
Duran, M. S. (2014). Manual de anotação do PropBank-Br v2. Technical report, ICMC-USP.
Duran, M. S. and Aluísio, S. M. (2011). Propbank-br: a Brazilian Portuguese corpus annotated with semantic role labels. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology.
Duran, M. S. and Freitas, C. (2024). Guia de anotação de papéis semânticos seguindo o modelo PropBank no corpus Porttinari-base. (no prelo). Technical report, ICMC-USP.
Duran, M. S., Torres, L. S., Viviani, M. C., Hartmann, N., and Aluísio, S. M. (2014). Seleção e preparação de sentenças do corpus PLN-BR para compor o corpus de anotação de papéis semânticos Propbank-Br.v2. Technical report, Núcleo Interinstitucional de Linguística Computacional.
Evans, R. and Orasan, C. (2019). Sentence simplification for semantic role labelling and information extraction. In Mitkov, R. and Angelova, G., editors, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 285–294, Varna, Bulgaria. INCOMA Ltd.
Freitas, C. (2023). Dataset e corpus. In Caseli, H. and Volpe Nunes, M. d. G., editors, Processamento de Linguagem Natural: conceitos, técnicas e aplicações em Português, pages 1–37. BPLN.
Freitas, C. (2024). Anotação de papéis semânticos no corpus Porttinari-base: Procedimentos, resultados e análise. (no prelo). Technical report, ICMC-USP.
Freitas, C., Souza, E., Castro, M. C., Cavalcanti, T., Ferreira da Silva, P., and Corrêa Cordeiro, F. (2023). Recursos linguísticos para o PLN específico de domínio: o Petrolês. Linguamática, 15(2):51–68.
Gung, J. and Palmer, M. (2021). Predicate representations and polysemy in VerbNet semantic parsing. In Zarrieß, S., Bos, J., van Noord, R., and Abzianidze, L., editors, Proceedings of the 14th International Conference on Computational Semantics (IWCS), pages 51–62, Groningen, The Netherlands (online). Association for Computational Linguistics.
Han, H. and Choi, J. (2020). Establishing strong baselines for the new decade: Sequence tagging, syntactic and semantic parsing with bert. In Proceedings of the Thirty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS 2020).
Hartmann, N. S., Duran, M. S., and Aluísio, S. M. (2016). Automatic semantic role labeling on non-revised syntactic trees of journalistic texts. In Silva, J., Ribeiro, R., Quaresma, P., Adami, A., and Branco, A., editors, Computational Processing of the Portuguese Language, pages 202–212, Cham. Springer International Publishing.
Levin, B. (1993). English Verb Classes and Alternations: a preliminary investigation. The University of Chicago Press, London.
Levin, B. and Rappaport Hovav, M. (2005). Argument Realization. Cambridge University Pres, Cambridge.
Li, T., Kazeminejad, G., Brown, S., Srikumar, V., and Palmer, M. (2023). Learning semantic role labeling from compatible label sequences. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15561–15572, Singapore. Association for Computational Linguistics.
Merlo, P. and Van Der Plas, L. (2009). Abstraction and generalisation in semantic role labels: PropBank, VerbNet or both? In Su, K.-Y., Su, J., Wiebe, J., and Li, H., editors, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 288–296, Suntec, Singapore. Association for Computational Linguistics.
Mohebbi, M., Razavi, S. N., and Balafar, M. A. (2022). Computing semantic similarity of texts based on deep graph learning with ability to use semantic role label information. Scientific Reports, 12(1).
Palmer, M., Gildea, D., and Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational linguistics, 31(1):71–106.
Pardo, T., Duran, M., Lopes, L., Felippo, A., Roman, N., and Nunes, M. (2021). Porttinari - a large multi-genre treebank for brazilian portuguese. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 1–10, Porto Alegre, RS, Brasil. SBC.
Rodrigues, J., Gomes, L., Silva, J., Branco, A., Santos, R., Cardoso, H. L., and Osório, T. (2023). Advancing neural encoding of portuguese with transformer albertina pt-*. In Moniz, N., Vale, Z., Cascalho, J., Silva, C., and Sebastião, R., editors, Progress in Artificial Intelligence, pages 441–453, Cham. Springer Nature Switzerland.
Sanches Duran, M. and Aluísio, S. (2015). Automatic generation of a lexical resource to support semantic role labeling in Portuguese. In Palmer, M., Boleda, G., and Rosso, P., editors, Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, pages 216–221, Denver, Colorado. Association for Computational Linguistics.
Tenney, I., Das, D., and Pavlick, E. (2019a). BERT rediscovers the classical NLP pipeline. In Korhonen, A., Traum, D., and Màrquez, L., editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4593–4601, Florence, Italy. Association for Computational Linguistics.
Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., McCoy, R. T., Kim, N., Durme, B. V., Bowman, S. R., Das, D., and Pavlick, E. (2019b). What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations.
Wallis, S. (2003). Completing parsed corpora: From correction to evolution. In Abeillé, A., editor, Treebanks: Building and Using Parsed Corpora, pages 61–71. Springer Netherlands, Dordrecht.
Wang, N., Li, J., Meng, Y., Sun, X., Qiu, H., Wang, Z., Wang, G., and He, J. (2022). An MRC framework for semantic role labeling. In Calzolari, N., Huang, C.-R., Kim, H., Pustejovsky, J., Wanner, L., Choi, K.-S., Ryu, P.-M., Chen, H.-H., Donatelli, L., Ji, H., Kurohashi, S., Paggio, P., Xue, N., Kim, S., Hahm, Y., He, Z., Lee, T. K., Santus, E., Bond, F., and Na, S.-H., editors, Proceedings of the 29th International Conference on Computational Linguistics, pages 2188–2198, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Yi, S.-t., Loper, E., and Palmer, M. (2007). Can semantic roles generalize across genres? In Sidner, C., Schultz, T., Stone, M., and Zhai, C., editors, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 548–555, Rochester, New York. Association for Computational Linguistics.
Publicado
17/11/2024
Como Citar
FREITAS, Cláudia; PARDO, Thiago Alexandre Salgueiro.
PropBank e anotação de papéis semânticos para a língua portuguesa: O que há de novo?. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 15. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 118-128.
DOI: https://doi.org/10.5753/stil.2024.245377.