Exploring Temporal Text-to-SQL Challenges in Brazilian Portuguese: Lessons from Educational Data
Resumo
Recent advances in natural language processing have enabled translating natural language into SQL, but challenges remain in multilingual and temporal contexts. This short paper presents a beginner-level exploratory analysis of a prompt engineering strategy for text-to-SQL generation over Brazilian Portuguese educational data. Through 10 representative examples reflecting how real users might ask questions about open government data, we show how language variability, implicit temporal references, and mismatched expectations affect SQL generation and the reliability of standard evaluation metrics. This work contributes to historical data querying research and highlights persistent challenges for multilingual text-to-SQL systems.
Palavras-chave:
Text-to-SQL, brazilian portuguese, temporal data, LLMs, educational data
Referências
Gao, D., Wang, H., Li, Y., Sun, X., Qian, Y., Ding, B., and Zhou, J. (2024). Text-to-SQL empowered by large language models: A benchmark evaluation. Proc. VLDB Endow., 17(5):1132–1145.
Levene, M., Loizou, G., Levene, M., and Loizou, G. (1999). Temporal relational databases. A Guided Tour of Relational Databases and Beyond, pages 385–408.
Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R., Huo, N., et al. (2023). Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36:42330–42357.
Őzcan, F., Quamar, A., Sen, J., Lei, C., and Efthymiou, V. (2020). State of the art and open challenges in natural language interfaces to data. In Proceedings of the 2020 ACM SIGMOD international conference on management of data, pages 2629–2636.
Pourreza, M. and Rafiei, D. (2024). Din-SQL: Decomposed in-context learning of text-to-SQL with self-correction. Advances in Neural Information Processing Systems, 36.
Sun, R., Arik, S. Ö., Muzio, A., Miculicich, L., Gundabathula, S., Yin, P., Dai, H., Nakhost, H., Sinha, R., Wang, Z., et al. (2023). SQL-PaLM: Improved large language model adaptation for text-to-SQL (extended). arXiv preprint arXiv:2306.00739.
Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
Visperas, M., Adoptante, A. J., Borjal, C. J., Abia, M. T., Catapang, J. K., and Peramo, E. (2023). On modern text-to-SQL semantic parsing methodologies for natural language interface to databases: A comparative study. In 2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pages 390–396. IEEE.
Wang, B., Shin, R., Liu, X., Polozov, O., and Richardson, M. (2019). Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers. arXiv preprint arXiv:1911.04942.
Xu, X., Liu, C., and Song, D. (2017). Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436.
Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., et al. (2018). Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.
Zhong, V., Xiong, C., and Socher, R. (2017). Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.
Levene, M., Loizou, G., Levene, M., and Loizou, G. (1999). Temporal relational databases. A Guided Tour of Relational Databases and Beyond, pages 385–408.
Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R., Huo, N., et al. (2023). Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36:42330–42357.
Őzcan, F., Quamar, A., Sen, J., Lei, C., and Efthymiou, V. (2020). State of the art and open challenges in natural language interfaces to data. In Proceedings of the 2020 ACM SIGMOD international conference on management of data, pages 2629–2636.
Pourreza, M. and Rafiei, D. (2024). Din-SQL: Decomposed in-context learning of text-to-SQL with self-correction. Advances in Neural Information Processing Systems, 36.
Sun, R., Arik, S. Ö., Muzio, A., Miculicich, L., Gundabathula, S., Yin, P., Dai, H., Nakhost, H., Sinha, R., Wang, Z., et al. (2023). SQL-PaLM: Improved large language model adaptation for text-to-SQL (extended). arXiv preprint arXiv:2306.00739.
Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
Visperas, M., Adoptante, A. J., Borjal, C. J., Abia, M. T., Catapang, J. K., and Peramo, E. (2023). On modern text-to-SQL semantic parsing methodologies for natural language interface to databases: A comparative study. In 2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pages 390–396. IEEE.
Wang, B., Shin, R., Liu, X., Polozov, O., and Richardson, M. (2019). Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers. arXiv preprint arXiv:1911.04942.
Xu, X., Liu, C., and Song, D. (2017). Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436.
Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., et al. (2018). Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.
Zhong, V., Xiong, C., and Socher, R. (2017). Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.
Publicado
29/09/2025
Como Citar
FRÓES, Karina de Carvalho; BRAGHETTO, Kelly Rosa.
Exploring Temporal Text-to-SQL Challenges in Brazilian Portuguese: Lessons from Educational Data. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 40. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 963-969.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2025.247836.
