Training-Free Hybrid Evidence Retrieval for Question Answering: Dynamic Fusion of Knowledge-Graph Triples and Dense Text Embeddings

  • Otávio Calaça Xavier Federal University of Goiás
  • Anderson da Silva Soares Federal University of Goiás

Abstract


This work presents a sub-50ms, training-free retrieval pipeline that leverages a Neo4j knowledge graph and a ChromaDB vector index. Questions and passages are embedded with Sentence-BERT, and the retrieved entities seed a one-hop Cypher expansion in the knowledge graph. A transparent fusion based on Dice-Sørensen overlap ranks both passages and triples. On the WebQSP and CQA-12k benchmarks, this hybrid method achieves superior Recall@10, MRR, and nDCG@10 compared to BM25, graph-only, and vectoronly baselines. Requiring no learned parameters and running on commodity hardware, it offers a practical alternative to heavyweight neural re-rankers and a robust evidence layer for retrieval-augmented generation (RAG).
Keywords: Knowledge Graphs, Information Retrieval, Natural Language Processing, Embeddings

References

Berant, J., Chou, A., Frostig, R., and Liang, P. (2013). Semantic parsing on Freebase from question-answer pairs. In Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K., and Bethard, S., editors, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533–1544, Seattle, Washington, USA. Association for Computational Linguistics.

Chen, D., Fisch, A., Weston, J., and Bordes, A. (2017). Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051.

Ju, M., Yu, W., Zhao, T., Zhang, C., and Ye, Y. (2022). Grape: Knowledge graph enhanced passage reader for open-domain question answering. arXiv preprint arXiv:2210.02933.

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Petrov, S. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems (NeurIPS).

Malkov, Y. A. and Yashunin, D. A. (2018). Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836.

Oguz, B., Chen, X., Karpukhin, V., Peshterliev, S., Okhonko, D., Schlichtkrull, M., Gupta, S., Mehdad, Y., and Yih, S. (2020). Unik-qa: Unified representations of structured and unstructured knowledge for open-domain question answering. arXiv preprint arXiv:2012.14610.

Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.

Roberts, K. (2024). Information Retrieval, pages 195–230. Springer International Publishing, Cham.

Saha, A., Pahuja, V., Khapra, M. M., Sankaranarayanan, K., and Chandar, S. (2018). Complex sequential question answering: Towards learning to converse over linked question answer pairs with a knowledge graph.

Sarmah, B., Mehta, D., Hall, B., Rao, R., Patel, S., and Pasquali, S. (2024). Hybridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. In Proceedings of the 5th ACM International Conference on AI in Finance, pages 608–616.

Sun, H., Bedrax-Weiss, T., and Cohen, W. (2019). PullNet: Open domain question answering with iterative retrieval on knowledge bases and text. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2380–2390, Hong Kong, China. Association for Computational Linguistics.

Sun, H., Dhingra, B., Zaheer, M., Mazaitis, K., Salakhutdinov, R., and Cohen, W. (2018). Open domain question answering using early fusion of knowledge bases and text. In Riloff, E., Chiang, D., Hockenmaier, J., and Tsujii, J., editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4231–4242, Brussels, Belgium. Association for Computational Linguistics.

Xavier, O. C. and da Silva Soares, A. (2024). Geração com recuperação aumentada (rag) em grafos de conhecimento. In da Silva Monteiro Filho, J. M., Razente, H., and dos Santos Mello, R., editors, Tópicos em Gerenciamento de Dados e Informações: Minicursos do SBBD 2024. Sociedade Brasileira de Computação, São Paulo, Brazil.

Yih, W.-t., Richardson, M., Meek, C., Chang, M.-W., and Suh, J. (2016). The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 201–206.

Yu, D., Zhu, C., Fang, Y., Yu, W., Wang, S., Xu, Y., Ren, X., Yang, Y., and Zeng, M. (2021). Kg-fid: Infusing knowledge graph in fusion-in-decoder for open-domain question answering. arXiv preprint arXiv:2110.04330.

Zhou, M., Shi, Z., Huang, M., and Zhu, X. (2020). Knowledge-aided open-domain question answering. arXiv preprint arXiv:2006.05244.
Published
2025-09-29
XAVIER, Otávio Calaça; SOARES, Anderson da Silva. Training-Free Hybrid Evidence Retrieval for Question Answering: Dynamic Fusion of Knowledge-Graph Triples and Dense Text Embeddings. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 644-657. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247297.