Enhancing Retrieval-Augmented Generation through Sequential Fine-Tuning of Small Language Models

Abstract


Language Models (LMs) excel in general knowledge but often face challenges in specialized domains, where complexity and constant evolution pose additional obstacles. This study enhances the performance of Retrieval-Augmented Generation (RAG) systems for the Question Answering (QA) task through sequential fine-tuning of the RAG components, employing Small Language Models (SLMs). Our approach adjusts both the embedding model and the generative model using minimal computational resources and improves overall effectiveness compared to vanilla RAG. The proposed methodology, scalable and cost-effective, enables the practical application of RAG systems across different domains and tasks.

Keywords: Retrieval-Augmented Generation, Fine-Tuning, LoRA, Small Language Models, Information Retrieval, Specialized Domains

References

3rd Generation Partnership Project (3GPP) (2023). 3gpp specifications - release 18. [link].

Clark, C., Lee, K., Chang, M.-W., Kwiatkowski, T., Collins, M., and Toutanova, K. (2019). BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936, Minneapolis, Minnesota. Association for Computational Linguistics.

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., and Li, Q. (2024). A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, page 6491–6501, New York, NY, USA. Association for Computing Machinery.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., and Wang, H. (2024). Retrieval-augmented generation for large language models: A survey.

Gichamba, A., Idris, T. K., Ebiyau, B., Nyberg, E., and Mitamura, T. (2024). Colbert retrieval and ensemble response scoring for language model question answering. Accepted at the 2024 IEEE Global Communications (GLOBECOM) Workshops.

Han, Z., Gao, C., Liu, J., Zhang, J., and Zhang, S. Q. (2024). Parameter-efficient fine-tuning for large models: A comprehensive survey. Transactions on Machine Learning Research.

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.

Johnson, J., Douze, M., and Jégou, H. (2021). Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547.

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.

Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M.-W., Dai, A. M., Uszkoreit, J., Le, Q., and Petrov, S. (2019). Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.

LangChain (2023). How to split by token in langchain. [link]. Accessed: 2025-01-01.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA. Curran Associates Inc.

Maatouk, A., Ayed, F., Piovesan, N., Domenico, A. D., Debbah, M., and Luo, Z.-Q. (2023). Teleqna: A benchmark dataset to assess large language models telecommunications knowledge.

Maslej, N., Fattorini, L., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J., Ngo, H., Niebles, J. C., Parli, V., Shoham, Y., Wald, R., Clark, J., and Perrault, R. (2023). Artificial intelligence index report 2023.

Microsoft Research (2023). Phi-2: The surprising power of small language models. [link]. Accessed: 2025-01-01.

Möller, T., Reina, A., Jayakumar, R., and Pietsch, M. (2020). COVID-QA: A question answering dataset for COVID-19. In Verspoor, K., Cohen, K. B., Dredze, M., Ferrara, E., May, J., Munro, R., Paris, C., and Wallace, B., editors, Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, Online. Association for Computational Linguistics.

Muennighoff, N., Tazi, N., Magne, L., and Reimers, N. (2023). MTEB: Massive text embedding benchmark. In Vlachos, A. and Augenstein, I., editors, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014–2037, Dubrovnik, Croatia. Association for Computational Linguistics.

Piovesan, N., De Domenicoo, A., and Ayed, F. (2024). Telecom language models: Must they be large? In 2024 IEEE 35th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pages 1–6.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.

Rosenthal, S., Sil, A., Florian, R., and Roukos, S. (2025). CLAPnq: Cohesive long-form answers from passages in natural questions for RAG systems. Transactions of the Association for Computational Linguistics, 13:53–72.

Siriwardhana, S., Weerasekera, R., Wen, E., Kaluarachchi, T., Rana, R., and Nanayakkara, S. (2023). Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11:1–17.

Xiao, S., Liu, Z., Zhang, P., and Muennighoff, N. (2023). C-pack: Packaged resources to advance general chinese embedding.

Xiao, S., Liu, Z., Zhang, P., Muennighoff, N., Lian, D., and Nie, J.-Y. (2024). C-pack: Packed resources for general chinese embeddings. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, page 641–649, New York, NY, USA. Association for Computing Machinery.

Zhou, H., Hu, C., Yuan, Y., Cui, Y., Jin, Y., Chen, C., Wu, H., Yuan, D., Jiang, L., Wu, D., Liu, X., Zhang, C., Wang, X., and Liu, J. (2024). Large language model (llm) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities.
Published
2025-09-29
VEGA CENTENO OLIVERA, Ronaldinho; SANTOS, Frances A.; DOS REIS, Julio Cesar; DE SOUZA, Allan M.. Enhancing Retrieval-Augmented Generation through Sequential Fine-Tuning of Small Language Models. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 250-263. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247070.