Advanced Retrieval Augmented Generation for Local LLMs

  • Leonardo Marques Rocha Instituto Atlântico
  • Rian Manoel Pessoa Instituto Atlântico

Abstract


This paper presents an extension of Retrieval Augmented Generation workflow for Large Language Models executing in a local processor with limited resources. The novelty presented is an improvement on such workflows to consider limitations in the total budget of token usage and privacy of data using local storage of data. This has the potential to leverage applications without the necessity of online services that can have a high cost and latency due the Chain of Thought used in most data retrieval cases. The presented workflow has a lightweight usage of computing and can be fully implemented in a low-resource compute environment.
Keywords: Large Language Models, workflow, limited resources

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In *Advances in Neural Information Processing Systems* (pp. 5998-6008).

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Amodei, D. (2020). Language models are few-shot learners. *arXiv preprint arXiv:2005.14165*.

Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, Q., Lester, B., Du, N., Dai, A. M., Smolensky, P., & Le, Q. (2022). Emergent abilities of large language models. *arXiv preprint arXiv:2206.07682*.

Ha, D., & Schmidhuber, J. (2018). World models. *arXiv preprint arXiv:1803.10122*.

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., & Bang, Y. (2023). Survey of hallucination in natural language generation. *ACM Computing Surveys (CSUR)*, 55(12), 1-38.

Karpathy, A. (2023, August 1). Personal communication on X (formerly Twitter). Retrieved from [X Platform] [link].

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. *OpenAI Blog*, 1(8), 9.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. *arXiv preprint arXiv:2201.11903*.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., & Rocktäschel, T. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In *Advances in Neural Information Processing Systems*, 33, 9459-9474.

Dettmers, T., Lewis, M., Shleifer, S., & Zettlemoyer, L. (2022). LLM.int8(): 8-bit matrix multiplication for transformers at scale. *arXiv preprint arXiv:2208.07339*.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv preprint arXiv:2312.10997. Retrieved from [link].

Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712. Retrieved from [link].
Published
2024-11-17
ROCHA, Leonardo Marques; PESSOA, Rian Manoel. Advanced Retrieval Augmented Generation for Local LLMs. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 21. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 767-776. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2024.245238.