Comparação de Modelos de Embeddings e LLMs para Geração Aumentada por Recuperação em Português

Luiz Sabiano Ferreira Medeiros; Hilário Tomaz Alves de Oliveira

doi:10.5753/semish.2025.9027

Luiz Sabiano Ferreira Medeiros IFES
Hilário Tomaz Alves de Oliveira IFES

DOI: https://doi.org/10.5753/semish.2025.9027

Resumo

Os modelos de linguagem de larga escala (LLMs) representam um avanço para a área de processamento de linguagem natural, impulsionando o desempenho em tarefas como geração de texto e resposta a perguntas. No entanto, eles enfrentam desafios como alucinações e falta de acesso a informações atualizadas. A técnica de geração aumentada por recuperação (RAG) busca mitigar esses problemas ao integrar recuperação de informações externas à geração de texto, melhorando a precisão e a atualidade das respostas. Este trabalho realizou uma investigação de diversos modelos embeddings e LLMs de código aberto e proprietários aplicados à técnica RAG considerando três bases de dados contendo documentos escritos em português do Brasil. Os resultados experimentais demonstraram que os modelos Multilingual E5 large e Gemma 2 9B obtiveram o melhor desempenho dentre os modelos avaliados com base em diferentes medidas de avaliação.

Referências

Abonizio, H., Almeida, T. S., Laitz, T., Junior, R. M., Bonás, G. K., Nogueira, R., and Pires, R. (2024). Sabiá-3 technical report. arXiv preprint arXiv:2410.12049.

Chen, J., Lin, H., Han, X., and Sun, L. (2024). Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17754–17762.

da Costa, L. and e Souza Filho, J. O. (2024). Adapting llms to new domains: A comparative study of fine-tuning and rag strategies for portuguese qa tasks. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 267–277, Porto Alegre, RS, Brasil. SBC.

Es, S., James, J., Anke, L. E., and Schockaert, S. (2024). Ragas: Automated evaluation of retrieval augmented generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 150–158.

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., and Li, Q. (2024). A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6491–6501.

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. (2024). The llama 3 herd of models. arXiv preprint arXiv:2407.21783.

Iaroshev, I., Pillai, R., Vaglietti, L., and Hanne, T. (2024). Evaluating retrieval-augmented generation models for financial report question and answering. Applied Sciences, 14(20):9318.

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., and Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12):1–38.

Kuratomi, G., Pirozelli, P., Cozman, F., and Peres, S. (2024). A rag-based institutional assistant. In Anais do XXI Encontro Nacional de Inteligência Artificial e Computacional, pages 755–766, Porto Alegre, RS, Brasil. SBC.

Leite, B., Osório, T. F., and Cardoso, H. L. (2024). Fairytaleqa translated: Enabling educational question and answer generation in less-resourced languages. In European Conference on Technology Enhanced Learning, pages 222–236. Springer.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474.

Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.

Paranhos, S., Tomazini, J., Junior, C. C., and de Oliveira, S. T. (2024). Avaliação do impacto de diferentes padrões arquiteturais rag em domínios jurídicos. In Anais da XII Escola Regional de Informática de Goiás, pages 99–108, Porto Alegre, RS, Brasil. SBC.

Paschoal, A. F., Pirozelli, P., Freire, V., Delgado, K. V., Peres, S. M., José, M. M., Nakasato, F., Oliveira, A. S., Brandão, A. A., Costa, A. H., et al. (2021). Pirá: A bilingual portuguese-english dataset for question-answering about the ocean. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 4544–4553.

Passinato, E. B., Rios, W. S., and Galvão Filho, A. R. (2024). Integração de modelos de linguagem e rag na criação de chatbots oftalmológicos. In Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), pages 354–365. SBC.

Rajpurkar, P., Jia, R., and Liang, P. (2018). Know what you don’t know: Unanswerable questions for squad. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789.

Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., Sifre, L., Rivière, M., Kale, M. S., Love, J., et al. (2024). Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.

Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., and Wei, F. (2024). Multilingual e5 text embeddings: A technical report. arXiv preprint arXiv:2402.05672.

Wiratunga, N., Abeyratne, R., Jayawardena, L., Martin, K., Massie, S., Nkisi-Orji, I., Weerasinghe, R., Liret, A., and Fleisch, B. (2024). Cbr-rag: case-based reasoning for retrieval augmented generation in llms for legal question answering. In International Conference on Case-Based Reasoning, pages 445–460. Springer.

Xiong, G., Jin, Q., Lu, Z., and Zhang, A. (2024). Benchmarking retrieval-augmented generation for medicine. In Findings of the Association for Computational Linguistics ACL 2024, pages 6233–6251.

Xu, Y., Wang, D., Yu, M., Ritchie, D., Yao, B., Wu, T., Zhang, Z., Li, T. J.-J., Bradford, N., Sun, B., Hoang, T. B., Sang, Y., Hou, Y., Ma, X., Yang, D., Peng, N., Yu, Z., and Warschauer, M. (2022). Fantastic questions and where to find them: FairytaleQA – an authentic dataset for narrative comprehension. Association for Computational Linguistics.

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2019). Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223, 1(2).