Heuristic-Guided Text-to-SQL Translation with LLMs: Optimizing Natural Language Interfaces for Relational Databases

  • Laura Petrola Universidade Federal do Ceará (UFC)
  • Angelo Brayner Universidade Federal do Ceará (UFC)
  • Wellington Franco Universidade Federal do Ceará (UFC)

Resumo


Text-to-SQL mapping process plays a crucial role in enabling non-technical users to interact with relational databases using natural language. While Large Language Models (LLMs) have shown promising results in benchmark datasets, their performance in real-world settings often deteriorates. In this work, we propose a modular and adaptive agent that leverages the opensource LLM DeepSeek to perform translations with integrated feedback and query optimization. Our agent is prompt-engineered to refine query generation based on user or system-provided corrections. Using the TPC-H and Mondial benchmarks, as well as a real-world database, we demonstrate improved accuracy and execution efficiency, highlighting the impact of feedback loops and heuristic-based query rewriting.

Palavras-chave: Text-to-SQL, Relational Databases, Query Optimization, LLM

Referências

de Araujo, A. H. M., Monteiro, J. M., de Macedo, J. A. F., and Brayner, A. (2013). On using an automatic, autonomous and non-intrusive approach for rewriting sql queries. Journal of Information and Data Management, 3(3):1–15.

DeepSeek (2025). Deepseek. [link]. Acessado em: 2 de maio de 2025.

Fan, A., Urbanek, J., Ringshia, P., Dinan, E., Qian, E., Karamcheti, S., Prabhumoye, S., Kiela, D., Rocktaschel, T., Szlam, A., et al. (2020). Generating interactive worlds with text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 1693–1700.

Faroult, S. and L’Hermite, P. (2008). Refactoring SQL Applications. O’Reilly Media, Sebastopol, CA.

Garcia-Molina, H., Ullman, J. D., and Widom, J. (2000). Database System Implementation. Prentice Hall, New Jersey, USA.

Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., Akhtar, N., Wu, J., Mirjalili, S., et al. (2023). Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints, 1:1–26.

Hong, Z., Yuan, Z., Zhang, Q., Chen, H., Dong, J., Huang, F., and Huang, X. (2024). Next-generation database interfaces: A survey of llm-based text-to-sql. arXiv preprint arXiv:2406.08426.

Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., and Gao, J. (2024). Large language models: A survey. arxiv 2024. arXiv preprint arXiv:2402.06196.

Ministério da Cultura (2025). Mapas culturais - funarte. [link]. Acessado em: 2025-05-02.

Nascimento, E. R. S. (2024). Querying databases with natural language: The use of large language models for text-to-sql tasks. Dissertação de mestrado, Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil. Advisor: Marco Antonio Casanova.

OpenAI (2025). Chatgpt.

Ozdemir, S. (2023). Quick start guide to large language models: strategies and best practices for using ChatGPT and other LLMs. Addison-Wesley Professional.

Pedroso, B. C., Pereira, M. R., and Pereira, D. A. (2025). Performance evaluation of llms in the text-to-sql task in portuguese. In Proceedings of the SBSI25, Recife, PE.

Ramakrishnan, R. and Gehrke, J. (2002). Database Management Systems. McGraw-Hill, 3rd edition.

Sala, L., Sullutrone, G., and Bergamaschi, S. (2024). Text-to-sql with large language models: Exploring the promise and pitfalls. In Proceedings of the 32nd Symposium on Advanced Database Systems (SEBD 2024). CEUR Workshop Proceedings.

Shasha, D. and Bonnet, P. (2003). Database Tuning: Principles, Experiments, and Troubleshooting Techniques. Morgan Kaufmann.

Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., Zhong, S., Yin, B., and Hu, X. (2024). Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 18(6):1–32.

Zhang, Y., Jin, H., Meng, D., Wang, J., and Tan, J. (2024). A comprehensive survey on process-oriented automatic text summarization with exploration of llm-based methods. arXiv preprint arXiv:2403.02901. Preprint, not peer-reviewed.

Zhu, X., Li, Q., Cui, L., and Liu, Y. (2024). Large language model enhanced text-to-sql generation: A survey. arXiv preprint arXiv:2410.06011. Preprint, not peer-reviewed.
Publicado
29/09/2025
PETROLA, Laura; BRAYNER, Angelo; FRANCO, Wellington. Heuristic-Guided Text-to-SQL Translation with LLMs: Optimizing Natural Language Interfaces for Relational Databases. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 126-139. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247037.