Optimizing and Evaluating a Retrieval-Augmented Generation System for Normative Document Retrieval in Hospital Settings

  • Murilo Vargas da Cunha UFPEL / IFRS
  • Marilia Rosa Silveira UFPEL
  • Brenda Salenave Santana UFPEL
  • Larissa Astrogildo Freitas UFPEL
  • Ulisses Brisolara Corrêa UFPEL

Resumo


This paper presents the development and evaluation of a chatbot designed to consult documents written in Portuguese on regulatory procedures in a hospital environment, which uses a Retrieval-Augmented Generation (RAG) pipeline to increase the factual accuracy and relevance of its underlying Large Language Model (LLM). Using the RAG technique will allow for more efficient and accurate retrieval of information contained in hospital manuals and institutional documents, helping workers quickly access internal guidelines and procedures. The objective is to optimize each system component (retrieval, re-ranking, and generation) to analyze the impact of each step in developing a RAG system for a low-resource language such as Portuguese. The methodology can be divided into the following stages: (1) the Golden Set Preparation, formed by a set of questions and answer data; (2) comparison of three embedding models for initial retrieval and of three re-ranking methods, including Cross-Encoder, Reciprocal Rank Fusion (RRF), and an LLM-based re-ranker, using metrics such as MRR, NDCG@10; and (3) comparison of two generative models (Gemini 1.5 Flash and GPT-4o-mini), using the metric BERTScore. The results indicate that the intfloat/multilingual-e5-small embedding model minimizes retrieval failures. In the re-ranking stage, the LLM-based re-ranker achieved the highest ranking accuracy, yet the computationally lighter RRF method emerged as an excellent cost-effective alternative. We conclude that an optimized architecture for both efficiency and performance combines the intfloat embedding, the RRF reranker, and the Gemini generator.

Palavras-chave: Chatbot, Retrieval-Augmented Generation, Low-resource languages, Large language models

Referências

G. Aguzzi, M. Magnini, G. P. Salcuni, S. Ferretti, and S. Montagna. 2024. Applying Retrieval-Augmented Generation on Open LLMs for a Medical Chatbot Supporting Hypertensive Patients. In Proc. of the 3rd AIxIA Workshop on Artificial Intelligence for Healthcare (HC@AIxIA 2024), co-located with the 23rd Int. Conf. of the Italian Association for Artificial Intelligence (AIxIA 2024), Vol. 3880. CEUR-WS.org, Bolzano, Italy, 189–201.

I. Alonso, M. Oronoz, and R. Agerri. 2024. MedexpQA: Multilingual benchmarking of large language models for medical question answering. Artificial Intelligence in Medicine 155 (2024), 102938. DOI: 10.1016/j.artmed.2024.102938

M. Alshammary, M. N. Uddin, and L. Khan. 2024. RFPG: Question-Answering from Low-Resource Language (Arabic) Texts using Factually Aware RAG. In Proceedings of the 2024 IEEE 10th International Conference on Collaboration and Internet Computing (CIC 2024). IEEE, 107–116. DOI: 10.1109/CIC62241.2024.00023

Patrice Béchard and Orlando Marquez Ayala. 2024. Reducing hallucination in structured outputs via Retrieval-Augmented Generation. arXiv preprint arXiv:2404.08189 (2024). [link]

H.M. Caseli and M.G.V. Nunes (Eds.). 2024. Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português (3 ed.). BPLN. [link] Disponível em: [link].

O. Cederlund, S. Alawadi, and F. M. Awaysheh. 2024. LLMRAG: An Optimized Digital Support Service using LLM and Retrieval-Augmented Generation. In Proceedings of the 9th International Conference on Fog and Mobile Edge Computing (FMEC 2024). Malmö, Sweden, 54–62. DOI: 10.1109/FMEC62297.2024.10710181

Nadezhda Chirkova, David Rau, Hervé Déjean, Thibault Formal, Stéphane Clinchant, and Vassilina Nikoulina. 2024. Retrieval-augmented generation in multilingual settings. arXiv:2407.01463 [cs.CL] [link]

S. Devi, G. Dhar, C. Bharadwaj, and A. M. 2024. Retrieval Augmented MedLM. In Proceedings of the 2024 IEEE Conference on Artificial Intelligence (CAI 2024). IEEE, Singapore, Singapore, 1220–1221. DOI: 10.1109/CAI59869.2024.00217

A. N. T. Dieu, H. T. Nguyen, and C. T. D. Cong. 2024. The Enhanced Context for AI-Generated Learning Advisors with Advanced RAG. In Proceedings of the 2024 18th International Conference on Advanced Computing and Analytics (ACOMPA). Ben Cat, Vietnam, 94–101. DOI: 10.1109/ACOMPA64883.2024.00021

Empresa Brasileira de Serviços Hospitalares (EBSERH). 2025. Estrutura Administrativa — HU-UFSC. [link]. Acesso em: 13 jul. 2025.

W. Fan, Y. Ding, laur. Ning, S. Wang, H. lauri, D. Yin, and Q. Li. 2024. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, Barcelona, Spain, 6491–6501.

Luis-Bernardo Hernandez-Salinas, Juan Terven, E. A. Chavez-Urbiola, Diana-Margarita Córdova-Esparza, Julio-Alejandro Romero-González, Amadeo Arguelles, and Ilse Cervantes. 2024. IDAS: Intelligent Driving Assistance System Using RAG. IEEE Open Journal of Vehicular Technology 5 (2024), 1139–1165. DOI: 10.1109/OJVT.2024.3447449

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12 (Dec 2023), 38. DOI: 10.1145/3571730

Q. R. Lauro, S. Shankar, S. Zeighami, and A. Parameswaran. 2025. RAG without the lag: Interactive debugging for Retrieval-Augmented Generation pipelines. arXiv:2504.13587 [cs.CL] [link] arXiv preprint arXiv:2504.13587.

F. Magalhães. 2024. Estratégias de chunking para Retrieval-Augmented Generation (RAG): Uma análise detalhada da abordagem do SemDB. [link] [Accessed: Apr. 28, 2025].

Rahul Magar, Corwin Behnke, Harshita Jhamtani, and Partha Talukdar. 2023. COSTA: Contextualizing Source Texts for Generative Question Answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). [link]

C. Merola and J. Singh. 2025. Reconstructing context: Evaluating advanced chunking strategies for Retrieval-Augmented Generation. arXiv:2504.19754 [cs.CL] [link] arXiv preprint arXiv:2504.19754.

P. Mishra, A. Mahakali, and P. S. Venkataraman. 2024. SEARCHD - Advanced Retrieval with Text Generation using Large Language Models and Cross Encoding Re-ranking. In 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE). IEEE, Bari, Italy, 975–980. DOI: 10.1109/CASE59546.2024.10711642

R. Nai, E. Sulis, I. Fatima, and R. Meo. 2024. Large Language Models and Recommendation Systems: A Proof-of-Concept Study on Public Procurements. In Proc. of the 29th Int. Conf. on Applications of Natural Language to Information Systems (NLDB 2024), Part II. Turin, Italy, 280–290. DOI: 10.1007/978-3-031-70242-6_27

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. 2024. A Comprehensive Overview of Large Language Models. arXiv:2307.06435 [cs.CL] [link]

S. Obaid and N. Z. Bawany. 2024. SeerahGPT: Retrieval Augmented Generation based Large Language Model. In Proceedings of the 18th International Conference on Open Source Systems and Technologies (ICOSST 2024). Lahore, Pakistan, 1–7. DOI: 10.1109/ICOSST64562.2024.10871159

G. Perković, A. Drobnjak, and I. Botički. 2024. Hallucinations in LLMs: Understanding and Addressing Challenges. In Proceedings of the 47th MIPRO ICT and Electronics Convention (MIPRO 2024). IEEE, Opatija, Croatia, 2084–2088. DOI: 10.1109/MIPRO60963.2024.10569238

Mubashar Raza, Zarmina Jahangir, Muhammad Bilal Riaz, Muhammad Jasim Saeed, and Muhammad Awais Sattar. 2025. Industrial applications of large language models. Scientific Reports 15 (2025), 13755. DOI: 10.1038/s41598-025-98483-1

Sheng Shen, Yuandong Tian, Lingfei Kong, Wei Chen, Shiyu Yan, Dinesh Sahoo, andWayne Xin Zhao. 2023. RAGFusion: Towards Improved Retrieval-Augmented Generation with Fusion-in-Decoder. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). [link]

K. S. K. Subramanyam and S. Sangeetha. 2020. SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics 101 (2020), 103323. DOI: 10.1016/j.jbi.2019.103323

O. C. Wijaya and A. Purwarianti. 2024. An Interactive Question-Answering System Using Large Language Model and Retrieval-Augmented Generation in an Intelligent Tutoring System on the Programming Domain. In Proceedings of the 2024 11th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA). Singapore, 1–6. DOI: 10.1109/ICAICTA63815.2024.10763263

Duyu Yang, Yichong Zhang, Yuxuan Zhang, Lijie Zhang, Wenyuan Liu, Rui Xie, Zhiyuan Liu, Duyu Tang, and Ming Zhou. 2023. Faithful RAG: Towards Faithful Retrieval-Augmented Generation via Denoising Re-ranking. In Findings of the Association for Computational Linguistics: EMNLP 2023. [link]

H. Yu, A. Gan, K. Zhang, S. Tong, Q. Liu, and Z. Liu. 2025. Evaluation of Retrieval-Augmented Generation: A Survey. In Proceedings of the Big Data,W. Zhu, H. Xiong, X. Cheng, L. Cui, Z. Dou, J. Dong, S. Pang, L. Wang, L. Kong, and Z. Chen (Eds.). Springer, Singapore, 102–120.
Publicado
10/11/2025
CUNHA, Murilo Vargas da; SILVEIRA, Marilia Rosa; SANTANA, Brenda Salenave; FREITAS, Larissa Astrogildo; CORRÊA, Ulisses Brisolara. Optimizing and Evaluating a Retrieval-Augmented Generation System for Normative Document Retrieval in Hospital Settings. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 385-393. DOI: https://doi.org/10.5753/webmedia.2025.16029.