Do LLMs support generating code that is compliant with creational design patterns?

Caio S. Machado; Eduardo Kugler Viegas; Silvana Morita Melo; Leo Natan Paschoal

doi:10.5753/wbots.2025.15181

Caio S. Machado USP
Eduardo Kugler Viegas PUC-PR
Silvana Morita Melo UFGD
Leo Natan Paschoal PUC-PR

DOI: https://doi.org/10.5753/wbots.2025.15181

Resumo

Large Language Models (LLMs) have been increasingly employed in Software Engineering; however, their reliability in applying established practices such as design patterns remains uncertain. This study investigates the ability of the LLMs ChatGPT, Gemini, and Copilot to generate code that adheres to creational design patterns. The analysis considers the correctness of the implementations, as well as the complexity and maintainability of the generated code. To this end, a comparative study was conducted involving 150 programs generated by the models and 50 reference programs developed by humans. The results reveal that, although LLMs are generally capable of applying design patterns, a critical failure rate of 22.67% was observed, with significant performance variation across models and design patterns. Regarding code complexity, no substantial differences were identified; however, in most cases, the artifacts generated by the LLMs exhibited higher maintainability than the human-written reference implementations.

Referências

Feuerriegel, S., Hartmann, J., Janiesch, C., and Zschech, P. (2023). Generative ai. Business & Information Systems Engineering. Accessed: 17 Mar. 2024.

Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional, Reading, Massachusetts, USA, 1 edition.

Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (2007). Padrões de projeto: soluções reutilizáveis de software orientado a objetos. Bookman, Porto Alegre, Rio Grande do Sul, Brasil, 1 edition.

Halstead, M. H. (1977). Elements of Software Science (Operating and programming systems series). Elsevier Science Inc.

Jamil, M. T., Abid, S., and Shamail, S. (2025). Can llms generate higher quality code than humans? an empirical study. In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR), pages 478–489. IEEE.

Kabir, S., Kou, B., Udo-Imeh, D. N., and Zhang, T. (2024). Is stack overflow obsolete? an empirical study of the characteristics of chatgpt answers to stack overflow questions. Accessed: 21 Mar. 2024.

Li, J. et al. (2023). Acecoder: Utilizing existing code to enhance code generation. arXiv preprint arXiv:2303.17780.

Liu, H. et al. (2023). Autotestgpt: A system for the automated generation of software test cases based on chatgpt. Available at SSRN 4584792.

Nguyen-Duc, A. et al. (2023). Generative artificial intelligence for software engineering– a research agenda. arXiv preprint arXiv:2310.18648.

Riehle, D. (2011). Transactions on pattern languages of programming ii. In Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, Germany. Accessed: 13 Oct. 2024.

Simões, I. R. d. S. and Venson, E. (2024). Evaluating source code quality with large language models: a comparative study. In Proceedings of the XXIII Brazilian Symposium on Software Quality, pages 103–113.