Provérbios portugueses usuais: distribuição em corpora
Resumo
Os provérbios são um tipo especial de unidades linguísticas que tem sido amplamente ignorado pela comunidade de Processamento de Linguagem Natural (PLN), apesar de levantarem desafios interessantes para o processamento. Este artigo apresenta o procedimento de integração do Mínimo Paremiológico do Português no sistema STRING e a distribuição desses provérbios mais usuais em três corpora distintos do português (europeu).
Referências
Davis, E., Danforth, C. M., Mieder, W., and Dodds, P. S. (2021). Computational paremiology: Charting the temporal, ecological dynamics of proverb use in books, news articles, and tweets. http://arxiv.org/abs/2107.04929.
Machado, J. (2011). O Grande Livro dos Provérbios. Casa das Letras, (4a ed.), Alfragide. Mamede, N., Baptista, J., Diniz, C., and Cabarrão, V. (2012). STRING - A Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese. In Abad, A., editor, International Conference on Computational Processing of Portuguese (PROPOR 2012) - Demo Session, Coimbra, Portugal. http://www.propor2012.org/demos/DemoSTRING.pdf.
Marques, J. (2013). Anaphora resolution. Master’s thesis, Instituto Superior Técnico - Universidade de Lisboa, L2F/INESC-ID, Lisboa.
Mendes, R. and Oliveira, H. G. (2020a). Comparing different methods for assigning Portuguese proverbs to news headlines. In Mikolov, T., Yih, W.-T., and Zweig, G., editors, Linguistic regularities in continuous space word representations. Proceedings of NAACL-HLT, NAACL., pages 746–751. 11th International Conference on Computational Creativity (ICCC’20), ACL.
Mendes, R. and Oliveira, H. H. (2020b). TeCo: Exploring Word Embeddings for Text Adaptation to a given Context. In Proceedings of ICCC. 11th International Conference on Computational Creativity (ICCC’20), ACL.
Mitkov, R. (2002). Anaphora Resolution. Pearson – Prentice Hall.
Moreira, A. (1996). Provérbios Portugueses. Editorial Notícias, Lisboa.
Parente, S. (2005). O Livro dos Provérbios. Editora Âncora, Lisboa.
Rassi, A. P., Baptista, J., and Vale, O. A. (2014a). Proverb variation: Experiments on automatic detection in Brazilian Portuguese texts. In Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T., and Volpe Nunes, M., editors, Computational Processing of the Portuguese Language, volume 8775 of Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence, pages 141–152, Berlin. 11th International Conference PROPOR’2014, S˜ao Carlos – SP, Brazil, October 8-10, 2014, Springer.
Rassi, A. P., Vale, O. A., and Baptista., J. (2014b). Automatic detection of proverbs and their variants. In Pereira, M., Leal, J., and Simões, A., editors, Proceedings of the Symposium on Languages, Applications and Technologies (SLATE‘14), pages 235–250, Leibniz (Germany). Symposium on Languages, Applications and Technologies (SLATE‘14), Bragança (Portugal), June 19-20, 2014., Schloss Dagstuhl - Leibniz-Zentrum fur Informatik, Dagstuhl Publishing.
Reis, S. (2020). Expressões proverbiais do português: Usos, variação formal e Identificação automática. PhD thesis, Universidade do Algarve, Faro, Algarve, Portugal.
Reis, S. and Baptista, J. (2016a). Estimating lexical availability of european portuguese proverbs. In Mitkov, R. and Corpas Pastor, G., editors, EUROPHRAS 2017, volume 10596 of Lecture Notes in Computer Science, pages 232–244, Cham. Springer.
Reis, S. and Baptista, J. (2016b). Let’s Play with Proverbs? NLP Tools and Resources for iCALL Applications around Proverbs for PFL. In Proceedings of the International Interdisciplinary Conference in Social and Human Sciences, Faro, Portugal. University of Algarve, Faculty of Economics.
Reis, S. and Baptista, J. (2016c). O uso de provérbios no ensino de português. In Soares, R. & Lauhakangas, O. (Eds.) 10th Interdisciplinary Colloquium on Proverbs, Actas ICP16 Proceedings. Tavira: AIP-IAP, 2017, pp. 521–538.
Reis, S. and Baptista, J. (2017). Os provérbios em manuais de ensino de português língua não materna. In Vládia Pinheiro & Gustavo Henrique Paetzold (Eds.) Proceedings of Symposium in Information and Human Language Technology Uberlandia, MG, Brazil, October 2-5, 2017, Sociedade Brasileira de Computação, pp. 247–255.
Reis, S. and Baptista, J. (2020). Determinação de um mínimo paremiológico do português europeu. Acta Scientiarum. Language and Culture, 42(2):e52114. https://doi.org/10.4025/actascilangcult.v42i2.52114.
Santos, D. and Rocha, P. (2001). Evaluating CETEMPúblico: A Free Resource for Portuguese. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pages 442–449, Toulouse, France.
Talhadas, R., Baptista, J., and Mamede, N. (2013). Semantic roles annotation guidelines. Technical report, L2F/INESC ID Lisboa.
Trindade, J. (2020). Syntax Deep Explorer: Integrating multi-corpora support into a corpus analysis tool. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, L2F/INESC-ID, Lisboa.