Corpus Memórias Paroquiais: Avanços em Reconhecimento de Entidades

  • Renata Vieira Universidade de Évora / CIDEHUS
  • Helena Cameron Universidade de Évora / CIDEHUS / Instituto Politécnico de Portalegre
  • Fernanda Olival Universidade de Évora / CIDEHUS
  • Joaquim Santos UNISINOS

Resumo


Este artigo aborda os avanços recentes em REN no contexto do corpus Memórias Paroquiais. O corpus foi enriquecido com anotações adicionais que introduzem categorias dedicadas à fauna e flora. Além disso, é discutido um estudo sobre a adaptabilidade do modelo para lidar com dados originais sem normalização.

Referências

Aguilar, G., Maharjan, S., Monroy, A. P. L., and Solorio, T. (2017). A multi-task approach for named entity recognition in social media data. In Proceed ings of the 3rd Workshop on Noisy User-generated Text, pages 148–153.

Albuquerque, H. O., Souza, E., Gomes, C., Pinto, M. H. d. C., Ricardo Filho, P., Costa, R., Lopes, V. T. d. M., da Silva, N. F., de Carvalho, A. C., and Oliveira, A. L. (2023). Named entity recognition: a survey for the portuguese language. Procesamiento del Lenguaje Natural, 70:171–185.

Amoia, M. and Martinez, J. M. (2013). Using comparable collections of historical texts for building a diachronic dictionary for spelling normalization. In Proceedings of the 7th workshop on language technology for cultural heritage, social sciences, and humanities, pages 84–89.

Baron, A. and Rayson, P. Vard2: A tool for dealing with spelling variation in historical corpora. In Postgraduate conference in corpus linguistics.

Bollmann, M. and Søgaard, A. (2016). Improving historical spelling normalization with bi-directional LSTMs and multi-task learning. arXiv preprint arXiv:1610.07844.

Burns, P. R. (2013). Morphadorner v2: A Java library for the morphological adornment of English language texts. Northwestern University, Evanston, IL.

Cameron, H. F., Olival, F., Vieira, R., and Neto, J. F. S. (2022). Named entity annotation of an 18th century transcribed corpus: problems, challenges. In Trojahn, C., Finatto, M. J., de Paiva, V., and Vieira, R., editors, Proceedings of the Second Workshop on Digital Humanities and Natural Language Processing (2nd DHandNLP 2022) co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2022), Virtual Event, Fortaleza, Brazil, 21st March, 2022, volume 3128 of CEUR Workshop Proceedings, pages 18–25. [link].

Capela, J. V. (2003). Freguesias do Distrito de Braga nas Memorias Paroquiais de 1758. Universidade do Minho.

Cosme, J. and Varandas, J. (2009). Memórias Paroquiais (1758), v.1. Caleidoscópio XVIII, 517pp edition.

Ehrmann, M., Hamdi, A., Pontes, E. L., Romanello, M., and Doucet, A. (2023). Named entity recognition and classification in historical documents: A survey. ACM Comput. Surv., 56(2).

Grilo, S., Bolrinha, M., Silva, J., Vaz, R., and Branco, A. (2020). The BDCamões collection of Portuguese literary documents: a research resource for digital humanities and language technology. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 849–854, Marseille, France. European Language Resources Association.

Madahil, A. R. (1937). Informações paroquiais do distrito de aveiro de 1721. In de Aveiro, A. D., editor, Arquivo do Distrito de Aveiro, Vol. III.

Nunes, R. O., Santos, J., Spritzer, A., Balreira, D. G., Freitas, C. M. D. S., Olival, F., Cameron, H. F., and Vieira, R. (2025). Assessing European and Brazilian Portuguese LLMs for NER in specialised domains. In Brazilian Conference on Intelligent Systems, pages 215–230. Springer.

Olival, F., Cameron, H. F., and Vieira, R. (2023). As Memórias Paroquiais: do manuscrito ao digital. Atas da Jornada de Humanidades Digitais do CIDEHUS, Universidade de Évora.

Pettersson, E., Megyesi, B., and Tiedemann, J. An SMT approach to automatic annotation of historical text. In Proceedings of the workshop on computa tional historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18, 087, pages 54–69. Linkoping University Electronic Press.

Rodrigues, J., Gomes, L., Silva, J., Branco, A., Santos, R., Cardoso, H. L., and Osório, T. (2023). Advancing neural encoding of Portuguese with transformer Albertina pt. In EPIA Conference on Artificial Intelligence, pages 441–453. Springer.

Rodrigues, M. R. S. and Neto, M. S. (2012). Informações paroquiais e história local: a diocese de Coimbra (século XVIII). Palimage Editores.

Samardžić, T., Scherrer, Y., and Glaser, E. (2015). Normalising orthographic and dialectal variants for the automatic processing of Swiss German. In Proceedings of the 7th Language and Technology Conference, pages 294–298. University of Zurich.

Santos, J., Cameron, H. F., Olival, F., Farrica, F., and Vieira, R. (2024). Named entity recognition specialised for Portuguese 18th-century history research. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese Vol. 1, pages 117–126, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.

Silva, A. V. (2023). Uma revis£o para o reconhecimento de entidades nomeadas aplicado † lngua portuguesa. Linguam°tica, 15(2):69–85.

Vieira, R., Olival, F., Cameron, H., Santos, J., Sequeira, O., and Santos, I. (2021). Enriching the 1758 portuguese parish memories (alentejo) with named entities. Journal of Open Humanities Data, 7:20.

Zilio, L., Finatto, M. J. B., and Vieira, R. (2022). Named entity recognition applied to Portuguese texts from the 18th century. In Proceedings of the Second Workshop on Digital Humanities and Natural Language Processing (2nd DHandNLP) co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2022) Virtual Event, Fortaleza, Brazil, CEUR Workshop Proceedings, v. 3128.
Publicado
29/09/2025
VIEIRA, Renata; CAMERON, Helena; OLIVAL, Fernanda; SANTOS, Joaquim. Corpus Memórias Paroquiais: Avanços em Reconhecimento de Entidades. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 16. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 478-489. DOI: https://doi.org/10.5753/stil.2025.37848.