RAG on Multimodal Databases: Orchestrating Textual, Vector, and Graph-based Retrieval

Otávio Calaça Xavier; Anderson da Silva Soares

doi:10.5753/sbbd_estendido.2025.tutorial3

Otávio Calaça Xavier Universidade Federal de Goiás
Anderson da Silva Soares Universidade Federal de Goiás

DOI: https://doi.org/10.5753/sbbd_estendido.2025.tutorial3

Resumo

This tutorial explores contemporary Information Retrieval (IR) techniques for building RAG systems from a multimodal database perspective. We cover the implementation of textual retrieval (e.g., Full-Text Search), the rise of vector search with native extensions (e.g., pg\_vector, ChromaDB), and the use of Knowledge Graphs with Cypher/GQL. Focusing on the challenge of hybrid search, the course presents evaluation metrics (e.g., Recall@K, MRR, NDCG@K) and relevance fusion techniques such as Reciprocal Rank Fusion (RRF). Finally, we demonstrate the construction of an end-to-end RAG pipeline that orchestrates these multiple data sources to augment an LLM. Participants will learn how to design and implement hybrid retrieval systems to enrich text generation with relevant, structured, and verifiable data.

Palavras-chave: Information Retrieval, Knowledge Graphs, Semantic Search, Embeddings, RAG

Referências

(2024). Information technology – database languages – gql. Standard ISO/IEC 39075:2024, International Organization for Standardization (ISO), Geneva, CH.

Chen, B., Guo, Z., Yang, Z., Chen, Y., Chen, J., Liu, Z., Shi, C., and Yang, C. (2025). Pathrag: Pruning graph-based retrieval augmented generation with relational paths. arXiv preprint arXiv:2502.14902.

Gao, Y., Xiong, Y., Gao, X., and et al. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint, arXiv:2312.10997.

Guo, Z., Xia, L., Yu, Y., Ao, T., and Huang, C. (2024). Lightrag: Simple and fast retrieval-augmented generation.

Lewis, P., Perez, E., Piktus, A., and et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (NeurIPS), pages 9459–9474.

Malkov, Y. A. and Yashunin, D. A. (2020). Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3982–3992.

Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. (1995). Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC-3), pages 109–126. NIST Special Publication.

Xavier, O. C. and da Silva Soares, A. (2024). Geração com recuperação aumentada (rag) em grafos de conhecimento. In da Silva Monteiro Filho, J. M., Razente, H., and dos Santos Mello, R., editors, Tópicos em Gerenciamento de Dados e Informações: Minicursos do SBBD 2024. Sociedade Brasileira de Computação, São Paulo, Brazil.