RAISE: Reasoning Agent for Interactive SQL Exploration

Fernando F. Granado; Roberto Lotufo; Jayr Pereira

doi:10.5753/stil.2025.37823

Fernando F. Granado UNICAMP
Roberto Lotufo UNICAMP
Jayr Pereira UNICAMP / UFCA

DOI: https://doi.org/10.5753/stil.2025.37823

Resumo

This work proposes a novel agentic framework that unifies schema linking, query generation, and iterative refinement for text-to-SQL within a single, end-to-end component. By leveraging LLM reasoning abilities, our method emulates human database interaction: understanding data through hypothesis formation, dynamic query validation, and result-based refinement. We introduce a strategy for scaling test-time computation by increasing the depth of interactive database exploration rather than traditional approaches. Our experiments show that our agent, equipped with steps to add more diversity to the answers, achieves 81.8% Best-of-N accuracy with 8 candidate rounds, rivaling the topranked published solution (82.79%) while reducing engineering complexity.

Referências

Androutsopoulos, I., Ritchie, G. D., and Thanisch, P. (1995). Natural language interfaces to databases–an introduction. Natural Language Engineering, 1(1):29–81.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.

Bubeck, S., Chadrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., et al. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4.

Cao, Z., Zheng, Y., Fan, Z., Zhang, X., Chen, W., and Bai, X. (2024). Rsl-sql: Robust schema linking in text-to-sql generation. arXiv preprint arXiv:2411.00073.

Chakraborty, S., Pourreza, M., Sun, R., Song, Y., Scherrer, N., Gu, J., and Pfister, T. (2025). Review, refine, repeat: Understanding iterative decoding of ai agents with dynamic evaluation and selection. arXiv preprint [link].

Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pages 253–262.

Dong, X., Zhang, C., Ge, Y., Mao, Y., Gao, Y., Lin, J., Lou, D., et al. (2023). C3: Zero-shot text-to-sql with chatgpt. arXiv preprint arXiv:2307.07306.

Gu, Z., Fan, J., Tang, N., Cao, L., Jia, B., Madden, S., and Du, X. (2023). Few-shot text-to-SQL translation using structure and content prompt learning. volume 1, pages 1–28.

Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948.

Jaech, A., Kalai, A., Lerer, A., Richardson, A., El-Kishky, A., Low, A., Helyar, A., Madry, A., Beutel, A., Carney, A., et al. (2024). Openai o1 system card. arXiv preprint arXiv:2412.16720.

Lee, D., Park, C., Kim, J., and Park, H. (2024). Mcs-sql: Leveraging multiple prompts and multiple-choice selection for text-to-sql generation. arXiv preprint arXiv:2405.07467.

Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R., Huo, N., et al. (2024). Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36.

Liu, X., Shen, S., Li, B., Ma, P., Jiang, R., Zhang, Y., and Luo, Y. (2024). A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? arXiv preprint.

Pourreza, M., Li, H., Sun, R., Chung, Y., Talaei, S., Kakkar, G. T., Gan, Y., Saberi, A., Ozcan, F., and Arik, S. O. (2024). Chase-sql: Multi-path reasoning and preference optimized candidate selection in text-to-sql. arXiv:2410.01943.

Pourreza, M. and Rafiei, D. (2023). Din-sql: Decomposed in-context learning of text-to-sql with self-correction. Advances in Neural Information Processing Systems, 36:36339–36348.

Pourreza, M., Talaei, S., Sun, R., Wan, X., Li, H., Mirhoseini, A., Saberi, A., Arik, S., et al. (2025). Reasoning-sql: Reinforcement learning with sql tailored partial rewards for reasoning-enhanced text-to-sql. arXiv preprint arXiv:2503.23157.

Snell, C., Lee, J., Xu, K., and Kumar, A. (2024). Scaling llm test-time compute optimally can be more effective than scaling model parameters. arXiv preprint arXiv:2408.03314.

Talaei, S., Pourreza, M., Chang, Y. C., Mirhoseini, A., and Saberi, A. (2024). Chess: Contextual Harnessing for Efficient SQL Synthesis. arXiv preprint [link].

Wang, B., Ren, C., Yang, J., Liang, X., Bai, J., Chai, L., Yan, Z., Zhang, Q.-W., Yin, D., Sun, X., et al. (2023). Mac-sql: A multi-agent collaborative framework for text-to-sql. arXiv preprint arXiv:2312.11242.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.

Xie, X., Xu, G., Zhao, L., and Guo, R. (2025). Opensearch-sql: Enhancing text-to-sql with dynamic few-shot and consistency alignment. arXiv preprint arXiv:2502.14913.