Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains

Rafael Oleques Nunes; Joaquim Santos; Andre Spritzer; Dennis Giovani Balreira; Carla Maria Dal Sasso Freitas; Fernanda Olival; Helena Freire Cameron; Renata Vieira

Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains

Rafael Oleques Nunes UFRGS
Joaquim Santos UNISINOS
Andre Spritzer UFRGS
Dennis Giovani Balreira UFRGS
Carla Maria Dal Sasso Freitas UFRGS
Fernanda Olival Portalegre Polytechnic University
Helena Freire Cameron Portalegre Polytechnic University
Renata Vieira University of Évora

Resumo

This paper discusses the impact of Portuguese variants in Large Language Models for the task of named entity recognition (NER) in specialised domains. The tests were made on a Brazilian Portuguese legal and a European Portuguese historical corpora. The models taken into account are BERTimbau (PT-BR), Albertina (PT-PT and PT-BR), and XML-R (multilingual). The impact was more evident in the Portuguese historical corpus, which resulted in higher F1 measures compared to previous works that did not consider the same language variant. Additionally, the study underscores the impact of model architecture on performance, highlighting the critical role of both linguistic alignment and model size in enhancing NER in specialised domains.

Springer (English)

Publicado

17/11/2024

Como Citar

Selecione um Formato

NUNES, Rafael Oleques; SANTOS, Joaquim; SPRITZER, Andre; BALREIRA, Dennis Giovani; FREITAS, Carla Maria Dal Sasso; OLIVAL, Fernanda; CAMERON, Helena Freire; VIEIRA, Renata. Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 215-230. ISSN 2643-6264.