Database Modeling Automation from Natural Language Requirements

Júlia O. K. Menezes; Claudio E. C. Campelo

doi:10.5753/sbbd.2025.247302

Júlia O. K. Menezes Universidade Federal de Campina Grande (UFCG)
Claudio E. C. Campelo Universidade Federal de Campina Grande (UFCG)

DOI: https://doi.org/10.5753/sbbd.2025.247302

Resumo

This paper proposes an approach to support relational database modeling through the automatic generation of Entity-Relationship (ER) diagrams from natural language requirements, leveraging Large Language Models (LLMs) combined with prompt engineering. The method extracts entities, relationships, and attributes from textual descriptions to produce visual ER diagrams. The evaluation was conducted in two phases: first, by testing different LLMs based on structured criteria; and second, by validating the results with students familiar with database modeling. The results indicate that, while challenges remain in handling cardinalities and nullable constraints, the generated diagrams generally align well with the original requirements. These findings reinforce the potential of LLMs to enhance conceptual database modeling.

Palavras-chave: Entity-Relationship Diagrams, Large-Scale Language Models, LLMs, Databases

Referências

Bagui, S. S. and Earp, R. W. (2003). Database Design Using Entity-Relationship Diagrams. Auerbach Publications.

Btoush, E. S. and Hammad, M. M. (2015). Generating er diagrams from requirement specifications based on natural language processing. International Journal of Database Theory and Application, 8(2):61–70.

Chen, P. (1976). The entity-relationship model - toward a unified view of data. ACM Transactions on Database Systems, 1(1):9–36.

Elmasri, R. and Navathe, S. B. (2011). Fundamentals of Database Systems. Addison-Wesley, 6th edition.

Houndji, V. R. and Akotenou, G. (2023). Umldesigner: An automatic uml diagram design tool. In Proceedings of the International Conference on Deep Learning Theory and Applications (DeLTA 2023), volume 1875 of Communications in Computer and Information Science, pages 309–321. Springer, Cham.

Magalhães, M. and Heuser, C. A. (2010). Projeto de Banco de Dados. Editora Érica.

Mishra, M., Sheikh, S., and Tonpe, T. (2024). Fine-tuning language models for enhanced diagram generation: A deep learning approach. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 12(IV):2819–2822.

Robinson, J., Ranjan, R., Hu, W., Huang, K., Han, J., Dobles, A., Fey, M., Lenssen, J. E., Yuan, Y., Zhang, Z., He, X., and Leskovec, J. (2024). Relbench: A benchmark for deep learning on relational databases. In Advances in Neural Information Processing Systems 37 (NeurIPS 2024), Datasets and Benchmarks Track, pages 21330–21341.

Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., and Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. ArXiv, abs/2401.10238. Department of Computer Science and Engineering, IIT Patna; Stanford University; Amazon AI.

Salem, N., Al-Tarawneh, K., Hudaib, A., Salem, H., Tareef, A., Salloum, H., and Mazzara, M. (2024). Generating database schema from requirement specification based on natural language processing and large language model. Computer Research and Modeling, 16(7):1703–1713. [link].

Thuan, N., Tran, N., and Quyen, T. (2024). Generating erd and ddl scripts from vietnamese natural language text by using a multi-phase approach. In Proceedings of the International Conference on Information Systems Design and Support.

Togatorop, P. R., Simanjuntak, R. P., Manurung, S. B., and Silalahi, M. C. (2021). Generating entity relationship diagram from requirement specification using natural language processing for indonesian language. In J-Icon: Jurnal Komputer dan Informatika, volume 9, pages 196–206. Institut Teknologi Del.

Zala, A., Lin, H., Cho, J., and Bansal, M. (2023). Diagrammergpt: Generating open-domain, open-platform diagrams via llm planning. [link].