Model-Driven Engineering for Implementation and Testing of Large Language Model Architectures

Jesús Carreño-Bolufer

doi:10.5753/cibse.2025.35315

Jesús Carreño-Bolufer UPV

DOI: https://doi.org/10.5753/cibse.2025.35315

Resumo

Large Language Model (LLM) architectures such as DeepSeek-V3 demonstrate reductions in computational costs through the design of efficient architectures, but their development reveals technical debt in the software engineering principles associated with LLM development, partly caused by the multidisciplinary nature of the field. Consequently, it leads to increased development costs and challenges in software quality. To address these issues, this thesis proposes the integration of Model-Driven Engineering into the LLM development life cycle. A conceptual metamodel formalises LLM architectural constructs (RQ1), enabling automated code generation via model transformations (RQ2) and facilitating Model-Based testing (RQ3).

Palavras-chave: Model-Driven Engineering, Large Language Models, Software Engineering, SE4AI, MDE4AI, Model-Driven Testing

Referências

Al-Azzoni, I. (2020). Model driven approach for neural networks. In 2020 International Conference on Intelligent Data Science Technologies and Applications (IDSTA), pages 87–94.

Alahdab, M. and Çalıklı, G. (2019). Empirical analysis of hidden technical debt patterns in machine learning software. In Franch, X., Männistö, T., and Martínez-Fernández, S., editors, Product-Focused Software Process Improvement, pages 195–202, Cham. Springer International Publishing.

Amershi, S. et al. (2019). Software engineering for machine learning: A case study. 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 291–300.

Baldassarre, M. T., Caivano, D., Fernandez Nieto, B., Gigante, D., and Ragone, A. (2023). The social impact of generative ai: An analysis on chatgpt. In Proceedings of the 2023 ACM Conference on Information Technology for Social Good, pages 363–373.

Banh, L. and Strobel, G. (2023). Generative artificial intelligence. Electronic Markets, 33(1):63.

Bommasani, R. et al. (2022). On the opportunities and risks of foundation models.

DeepSeek-AI (2024). Deepseek-V3 Technical report.

Dıíaz, V. G., Espada, J. P., García-Bustelo, B. C. P., and Lovelle, J. M. C. (2015). Towards a standard-based domain-specific platform to solve machine learning-based problems. Int. J. Interact. Multim. Artif. Intell., 3:6–12.

Foidl, H., Felderer, M., and Biffl, S. (2019). Technical debt in data-intensive software systems.

Gatto, N., Kusmenko, E., and Rumpe, B. (2019). Modeling deep reinforcement learning based architectures for cyber-physical systems. In 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C), pages 196–202.

Giachetti, G., Catalá, D., de Miguel, B., Carrascosa, C., de Miguel, M., and Pastor, O. (2023). Music360: Modeling the value of music. In CAiSE Research Projects Exhibition, pages 105–110.

Google-Cloud (2025). Ai platform: Machine learning solutions overview. [link]

Gozalo-Brizuela, R. and Garrido-Merchan, E. C. (2023). Chatgpt is not all you need. a state of the art review of large generative ai models. arXiv preprint arXiv:2301.04655.

Grattafiori, A. et al. (2024). The llama 3 herd of models.

Hartmann, T., Moawad, A., Fouquet, F., and Le Traon, Y. (2019). The next evolution of mde: a seamless integration of machine learning into domain modeling. Software & Systems Modeling, 18:1285–1304.

Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D. d. L., Hendricks, L. A., Welbl, J., Clark, A., et al. (2022). Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.

ISO/IEC 25010:2023 (2023). Systems and software engineering — Systems and software Quality Requirements and Evaluation (SQuaRE) — Product quality model. Technical report, International Organization for Standardization.

Liu, J., Huang, Q., Xia, X., Shihab, E., Lo, D., and Li, S. (2020). Is using deep learning frameworks free? characterizing technical debt in deep learning frameworks. In 2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS), pages 1–10.

Marín, B., Gallardo, C., Quiroga, D., Giachetti, G., and Serral, E. (2017). Testing of model-driven development applications. Software Quality Journal, 25:407–435.

Martínez-Fernández, S., Bogner, J., Franch, X., Oriol, M., Siebert, J., Trendowicz, A., Vollmer, A. M., and Wagner, S. (2022). Software engineering for ai-based systems: A survey. ACM Trans. Softw. Eng. Methodol., 31(2).

Naveed, H., Arora, C., Khalajzadeh, H., Grundy, J., and Haggag, O. (2024). Model driven engineering for machine learning components: A systematic literature review. Information and Software Technology, 169:107423.

Object Management Group (2019). Meta Object Facility (MOF) Core Specification, Version 2.5.1. Technical Report formal/2019-10-01, Object Management Group (OMG). Available: [link] [Accessed 20-02- 2025].

OpenAI (2023). GPT-4 Technical Report.

Pastor, O. and Molina, J. C. (2007). Model-driven architecture in practice: a software production environment based on conceptual modeling, volume 1. Springer.

Perrault, R. and Clark, J. (2024). Artificial intelligence index report 2024.

Rädler, S., Berardinelli, L., Winter, K., Rahimi, A., and Rinderle-Ma, S. (2024). Bridging mde and ai: a systematic review of domain-specific languages and model-driven practices in ai software systems engineering. Software and Systems Modeling, pages 1–25.

Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., and Dennison, D. (2015). Hidden technical debt in machine learning systems. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.

Shahab, O., El Kurdi, B., Shaukat, A., Nadkarni, G., and Soroush, A. (2024). Large language models: a primer and gastroenterology applications. Therapeutic Advances in Gastroenterology, 17.

towards data science (2025). DeepSeek V3: A New Contender in AI-Powered Data Science. Available: [link] [Accessed 14-02-2025].

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2023). Attention is all you need.

Wan, Z., Wang, X., Liu, C., Alam, S., Zheng, Y., Liu, J., Qu, Z., Yan, S., Zhu, Y., Zhang, Q., Chowdhury, M., and Zhang, M. (2024). Efficient large language models: A survey.

Wieringa, R. J. (2014). Design science methodology for information systems and software engineering. Springer.

Zhang, J. M., Harman, M., Ma, L., and Liu, Y. (2019). Machine learning testing: Survey, landscapes and horizons.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., and Wen, J.-R. (2024). A survey of large language models.

Çetiner, G., Yayan, U., and Yazici, A. (2024). Mutation-based white box testing of deep neural networks. IEEE Access, 12:160156–160174.