Enhancing Text-to-SQL with In-Context Learning: A Multi-Agent Approach Based on CHESS
Abstract
Text-to-SQL has gained increasing attention with Large Language Models (LLMs). While existing architectures have demonstrated the potential of multi-agent systems there remains significant room for improvement. In this work, we extend the CHESS framework by integrating In-Context Learning (ICL) techniques into the Candidate Generator module, evaluating three strategies: Zero-Shot, Few-Shot Learning, and Retrieval-Augmented Generation (RAG). We implement the system using GPT-4o, and perform experiments on the financial dataset from BIRD-SQL. Results show that Few-Shot Learning and RAG significantly outperform the standard approach. Compared to Zero-Shot (59.31% Execution Accuracy (EX), 0.412 ROUGE-1), RAG significantly boosted performance, increasing EX to 69.48% and ROUGE-1 to 0.652.
References
Hong, Z., Yuan, Z., Zhang, Q., Chen, H., Dong, J., Huang, F., and Huang, X. (2024). Next-generation database interfaces: A survey of llm-based text-to-sql. ArXiv.
Katsogiannis-Meimarakis, G. and Koutrika, G. (2023). A survey on deep learning approaches for text-to-sql. The VLDB Journal, 32:905—-936.
Lewis, P., Perez, E., Piktus, A., Petroni, F., and Karpukhin, V. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA. Curran Associates Inc.
Li, J., Hui, B., Cheng, R., Qin, B., and Ma, C. (2023). Graphix-t5: mixing pre-trained transformers with graph-aware layers for text-to-sql parsing. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. AAAI Press.
Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R., Huo, N., et al. (2024). Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36.
Nascimento, E. and Casanova, M. A. (2024). Querying databases with natural language: The use of large language models for text-to-sql tasks. In Anais Estendidos do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 196–201, Porto Alegre, RS, Brasil. SBC.
OpenAI (2025). Gpt-4o. Available on: [link]. Accessed on 21 April 2025.
Pourreza, M. and Rafiei, D. (2023). Din-sql: decomposed in-context learning of text-to-sql with self-correction. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY, USA. Curran Associates Inc.
Talaei, S., Pourreza, M., Chang, Y., Mirhoseini, A., and Saberi, A. (2024). Chess: Contextual harnessing for efficient sql synthesis. ArXiv.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V., and Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY, USA. Curran Associates Inc.
Yin, P., Neubig, G., Yih, W., and Riedel, S. (2020). Tabert: Pretraining for joint understanding of textual and tabular data. In Proceeding of the 58th Annual Meeting of the Association for Computational Linguistics.
Yu, T., Zhang, R., Yang, K., Yasunaga, M., and Wang, D. (2018). Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Riloff, E., editor, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, Brussels, Belgium. Association for Computational Linguistics.
