Adapting LLMs to New Domains: A Comparative Study of Fine-Tuning and RAG strategies for Portuguese QA Tasks

Leandro Yamachita da Costa; João Baptista de Oliveira e Souza Filho

doi:10.5753/stil.2024.245443

Leandro Yamachita da Costa UFRJ http://orcid.org/0009-0001-7932-6163
João Baptista de Oliveira e Souza Filho UFRJ https://orcid.org/0000-0001-6005-8480

DOI: https://doi.org/10.5753/stil.2024.245443

Resumo

The rise of Large Language Models (LLMs) represented a significant advance in text generation applications. However, LLMs face challenges in domains outside the scope of their original training. This study investigates the following two approaches to adapt LLMs to new domains in the context of generative question-answering (QA) with data in Portuguese: fine-tuning and Retrieval-Augmented Generation (RAG). The experiments carried out in this study demonstrate the effectiveness of incorporating external data sources, even in models that had not been adjusted for the specific domain. Furthermore, the combination of supervised fine-tuning with RAG proved to be the most effective approach.

Palavras-chave: Retrieval Augmented Generation, RAG, Fine-Tuning, LLMs, Portuguese Question Answering

Referências

Brown, Tom B. "Language models are few-shot learners." In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20) (2020). DOI: 10.48550/arXiv.2005.14165

Achiam, Josh, OpenAI et al. "GPT-4 technical report." arXiv preprint arXiv:2303.08774 (2023). DOI: 10.48550/arXiv.2303.08774

Kandpal, Nikhil, et al. "Large language models struggle to learn long-tail knowledge." International Conference on Machine Learning. PMLR, (2023). DOI: 10.48550/arXiv.2211.08411

Kasai, Jungo, et al. "REALTIME QA: What's the Answer Right Now?" Advances in Neural Information Processing Systems 36 (2024). DOI: 10.48550/arXiv.2207.13332

Zhang, Tianjun, et al. "RAFT: Adapting Language Model to Domain Specific RAG." arXiv preprint arXiv:2403.10131, (2024). DOI: 10.48550/arXiv.2403.10131

Guo, Kunpeng, et al. "Fine-tuning Strategies for Domain Specific Question Answering under Low Annotation Budget Constraints." IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, (2023). DOI: 10.48550/arXiv.2401.09168

Lewis, Patrick, et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems 33: 9459-9474, (2020). DOI: 10.48550/arXiv.2005.11401

Dettmers, Tim et al. “QLoRA: Efficient Finetuning of Quantized LLMs.” ArXiv abs/2305.14314, (2023). DOI: 10.48550/arXiv.2305.14314

Balaguer, Angels, et al. "RAG vs fine-tuning: Pipelines, tradeoffs, and a case study on agriculture." arXiv e-prints (2024): arXiv-2401. DOI: 10.48550/arXiv.2401.08406

Lála, Jakub, et al. "PaperQA: Retrieval-augmented generative agent for scientific research." arXiv preprint arXiv:2312.07559, (2023). DOI: 10.48550/arXiv.2312.07559

Zakka, Cyril, et al. "Almanac—retrieval-augmented language models for clinical medicine." NEJM AI 1.2 (2024): AIoa2300068. DOI: 10.48550/arXiv.2303.01229

Borgeaud, Sebastian, et al. "Improving language models by retrieving from trillions of tokens." International conference on machine learning. PMLR, (2022). DOI: 10.48550/arXiv.2112.04426

Shuster, Kurt, et al. "Retrieval augmentation reduces hallucination in conversation." arXiv preprint arXiv:2104.07567, (2021). DOI: 10.48550/arXiv.2104.07567

Vu, Tu, et al. "FreshLLMs: Refreshing large language models with search engine augmentation." arXiv preprint arXiv:2310.03214, (2023). DOI: 10.48550/arXiv.2310.03214

Izacard, Gautier, et al. "Atlas: Few-shot learning with retrieval augmented language models." Journal of Machine Learning Research 24.251 (2023): 1-43. DOI: 10.48550/arXiv.2208.03299

Sparck Jones, Karen. "A statistical interpretation of term specificity and its application in retrieval." Journal of documentation 28.1 (1972): 11-21.

Robertson, Stephen E. and Hugo Zaragoza. “The Probabilistic Relevance Framework: BM25 and Beyond.” Found. Trends Inf. Retr. 3 (2009): 333-389.

Karpukhin, Vladimir, et al. "Dense passage retrieval for open-domain question answering." arXiv preprint arXiv:2004.04906 (2020). DOI: 10.48550/arXiv.2004.04906

Khattab, Omar, and Matei Zaharia. "ColBERT: Efficient and effective passage search via contextualized late interaction over BERT." Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval (2020). DOI: 10.48550/arXiv.2004.12832

Xu, Lingling, et al. "Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment." arXiv preprint arXiv:2312.12148 (2023). DOI: 10.48550/arXiv.2312.12148

Hu, Edward J., et al. "LoRA: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021). DOI: 10.48550/arXiv.2106.09685

Li, Yixiao, et al. "LoftQ: Lora-fine-tuning-aware quantization for large language models." arXiv preprint arXiv:2310.08659 (2023). DOI: 10.48550/arXiv.2310.08659

Ovadia, Oded, et al. "Fine-tuning or retrieval? comparing knowledge injection in LLMs." arXiv preprint arXiv:2312.05934 (2023). DOI: 10.48550/arXiv.2312.05934

Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." Journal of machine learning research 21.140 (2020): 1-67. DOI: 10.48550/arXiv.1910.10683

Carmo, Diedre, et al. "PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data." arXiv preprint arXiv:2008.09144 (2020). DOI: 10.48550/arXiv.2008.09144

“Introducing Meta Llama 3: The most capable openly available LLM to date.” AI at Meta. (2024). [link]

Wagner Filho, Jorge A., et al. "The brWaC corpus: A new open resource for Brazilian Portuguese." Proceedings of the eleventh international conference on language resources and evaluation LREC (2018).

Devlin, Jacob et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” North American Chapter of the Association for Computational Linguistics (2019). DOI: 10.48550/arXiv.1810.04805

Reimers, Nils and Iryna Gurevych. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” Conference on Empirical Methods in Natural Language Processing (2019). DOI: 10.48550/arXiv.1908.10084

Santhanam, Keshav et al. “ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction.” North American Chapter of the Association for Computational Linguistics (2021). DOI: 10.48550/arXiv.2112.01488

Lin, Chin-Yew. “ROUGE: A Package for Automatic Evaluation of Summaries.” Annual Meeting of the Association for Computational Linguistics (2004).

Zhang, Tianyi, et al. "BERTScore: Evaluating text generation with BERT." arXiv preprint arXiv:1904.09675 (2019). DOI: 10.48550/arXiv.1904.09675

Paschoal, André FA, et al. "Pirá: A bilingual portuguese-english dataset for question-answering about the ocean." Proceedings of the 30th ACM International Conference on Information & Knowledge Management (2021). DOI: 10.48550/arXiv.2202.02398

Pirozelli, Paulo, et al. "Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change." Data Intelligence 6.1 (2024): 29-63. DOI: 10.48550/arXiv.2309.10945

Conover, Mike, et al. "Free Dolly: Introducing the world’s first truly open instruction-tuned LLM." Company Blog of Databricks (2023). [link]