Developing Resource-Efficient Clinical LLMs for Brazilian Portuguese

  • João Gabriel de Souza Pinto PUCPR / Comsentimento NLP Lab
  • Andrey Rodrigues de Freitas PUCPR
  • Anderson Carlos Gomes Martins IFG
  • Caroline Midori Rozza Sawazaki PUCPR
  • Caroline Vidal PUCPR
  • Lucas Emanuel Silva e Oliveira PUCPR / Comsentimento NLP Lab

Resumo


In this study, we developed and evaluated two medical large language models, Clinical-BR-LlaMA-2-7B and Clinical-BR-Mistral-7B-v0.2, specifically designed for Brazilian Portuguese. Utilizing the Low-Rank Adaptation (LoRA) technique, our models achieved significant improvements in generating synthetic clinical text, particularly in terms of Authenticity of Format and Structure, Spelling Accuracy, and Clinical Coherence. The evaluation, conducted by medical students using a 5-point Likert scale, demonstrated the effectiveness of our approach compared to baseline models. The scores indicate superior performance compared to baseline models such as LlaMA-2-7B and Mistral-7B-v0.2. Our results suggest that these resource-efficient models can effectively generate clinically relevant text, maintaining high standards of structure, accuracy, and coherence. Future work will focus on expanding datasets, refining evaluation protocols, and enhancing model robustness to further improve performance across various medical tasks.
Publicado
17/11/2024
PINTO, João Gabriel de Souza; FREITAS, Andrey Rodrigues de; MARTINS, Anderson Carlos Gomes; SAWAZAKI, Caroline Midori Rozza; VIDAL, Caroline; SILVA E OLIVEIRA, Lucas Emanuel. Developing Resource-Efficient Clinical LLMs for Brazilian Portuguese. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 46-60. ISSN 2643-6264.