Automating Cloud Infrastructure Provisioning with Semantically-Enriched Large Language Models

  • Weslley Paulo UFPE
  • Breno Vasconcelos UFPE
  • Carlos Ferraz UFPE

Resumo


The complexity of provisioning multi-cloud infrastructure has created a significant automation bottleneck, and while Large Language Models (LLMs) offer a promising solution, they consistently fail to generate reliable and deployable Infrastructure as Code (IaC) due to inherent ambiguity. To address this critical reliability gap, we propose a novel methodology that significantly improves IaC generation by augmenting LLM prompts with structured semantic context. Our approach utilizes OWL ontologies to formally model key infrastructure concepts, grounding the LLM in a machine-readable representation of the domain. This semantic enrichment provides the specific, structured context needed to resolve ambiguity and enhance the accuracy of the generated Terraform code. We evaluate our approach on the IAC-EVAL benchmark, comparing our semantically-enriched method against standard prompting strategies. Experimental results demonstrate a definitive improvement: our approach achieves a mean functional accuracy of 64.3%, a 126.4% increase over the baseline average of 28.4%. Syntactic validity also improved dramatically, with Terraform plan validation rates increasing by an average of 29.6%. These findings showcase that formal semantic grounding is a critical and highly effective technique for building reliable, LLM-driven automation for complex cloud environments.

Palavras-chave: Infrastructure as Code, Large Language Models, Prompt Engineering, Semantic Web, Ontologies, OWL, Cloud Computing, Code Generation, Natural Language Processing

Referências

JohnBosco Agbaegbu, Oluwasefunmi Tale Arogundade, Sanjay Misra, and Robertas Damaševičius. 2021. Ontologies in Cloud Computing—Review and Future Directions. Future Internet 13, 12 (2021). DOI: 10.3390/fi13120302

Dean Allemang and Juan Sequeda. 2024. Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! arXiv:2405.11706 [cs.AI] [link]

D Androcec, N Vrcek, and J Seva. 2012. Cloud computing ontologies: A systematic review. Proceedings of the third . . . (2012). [link]

D.M. Anisuzzaman, Jeffrey G. Malins, Paul A. Friedman, and Zachi I. Attia. 2025. Fine-Tuning Large Language Models for Specialized Use Cases. Mayo Clinic Proceedings: Digital Health 3, 1 (2025), 100184. DOI: 10.1016/j.mcpdig.2024.11.005

Christina Antoniou and Nick Bassiliades. 2025. Utilizing LLMs and ontologies to query educational knowledge graphs. In Proceedings of the 28th Pan-Hellenic Conference on Progress in Computing and Informatics (PCI ’24). Association for Computing Machinery, New York, NY, USA, 287–295. DOI: 10.1145/3716554.3716598

Chetan Arora, Ahnaf Ibn Sayeed, Sherlock Licorish, Fanyu Wang, and Christoph Treude. 2024. Optimizing large language model hyperparameters for code generation. arXiv preprint arXiv:2408.10577 (2024).

Patrick Bareiß, Beatriz Souza, Marcelo d’Amorim, and Michael Pradel. 2022. Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code. arXiv:2206.01335 [cs.SE] [link]

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, and Ion Stoica. 2024. Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. arXiv:2403.04132 [cs.AI] [link]

Cloud Native Computing Foundation. 2025. Open Policy Agent (OPA). [link].

William G. Cochran. 1977. Sampling Techniques (3rd ed.). Wiley.

Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M. Zhang. 2023. Large Language Models for Software Engineering: Survey and Open Problems . In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). IEEE Computer Society, Los Alamitos, CA, USA, 31–53. DOI: 10.1109/ICSEFoSE59343.2023.00008

Yingqiang Ge, Wenyue Hua, Kai Mei, jianchao ji, Juntao Tan, Shuyuan Xu, Zelong Li, and Yongfeng Zhang. 2023. OpenAGI: When LLM Meets Domain Experts. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 5539–5568. [link]

MIchele Guerriero, Martin Garriga, Damian A. Tamburri, and Fabio Palomba. 2019. Adoption, Support, and Challenges of Infrastructure-as-Code: Insights from Industry. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 580–589. DOI: 10.1109/ICSME.2019.00092

Hashicorp Group, an IBM company. 2025. HCL - HashiCorp Configuration Language). [link].

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review. ACM Trans. Softw. Eng. Methodol. 33, 8, Article 220 (Dec. 2024), 79 pages. DOI: 10.1145/3695988

Harshit Joshi, José Cambronero Sanchez, Sumit Gulwani, Vu Le, Ivan Radiček, and Gust Verbruggen. 2023. Repair is nearly generation: multilingual program repair with LLMs. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23/IAAI’23/EAAI’23). AAAI Press, Article 573, 10 pages. DOI: 10.1609/aaai.v37i4.25642

Patrick Tser Jern Kon, Jiachen Liu, Yiming Qiu, Weijun Fan, Ting He, Lei Lin, Haoran Zhang, Owen M. Park, George S. Elengikal, Yuxin Kang, Ang Chen, Mosharaf Chowdhury, Myungjin Lee, and Xinyu Wang. 2024. IaC-Eval: A Code Generation Benchmark for Cloud Infrastructure-as-Code Programs. In Advances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 134488–134506. [link]

D Krech. 2006. Rdflib: A python library for working with rdf. Online [link] (2006).

Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, and Patanamon Thongtanunam. 2025. Automatic Programming: Large Language Models and Beyond. ACM Trans. Softw. Eng. Methodol. 34, 5, Article 140 (May 2025), 33 pages. DOI: 10.1145/3708519

Benjamin Marie. 2022. BLEU: A Misunderstood Metric from Another Age. [link]. Accessed November 2022.

Beniamino Di Martino, Giuseppina Cretella, and Antonio Esposito. 2013. Semantic and Agnostic Representation of Cloud Patterns for Cloud Interoperability and Portability. In 2013 IEEE 5th International Conference on Cloud Computing Technology and Science, Vol. 2. 182–187. DOI: 10.1109/CloudCom.2013.123

DL McGuinness and F Van Harmelen. 2004. OWL web ontology language overview. W3C recommendation (2004). [link]

Senthamarai N, Jeyaselvi M, and Hemamalini V. 2025. Automatic Cloud Formation Using LLM. In 2025 International Conference on Intelligent and Cloud Computing (ICoICC). 1–6. DOI: document/11052114

Bastian Quilitz and Ulf Leser. 2008. Querying Distributed RDF Data Sources with SPARQL. In The Semantic Web: Research and Applications, Sean Bechhofer, Manfred Hauswirth, Jörg Hoffmann, and Manolis Koubarakis (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 524–538.

Ehud Reiter. 2018. A Structured Review of the Validity of BLEU. Computational Linguistics 44, 3 (Sept. 2018), 393–401. DOI: 10.1162/coli_a_00322

Matthew Renze. 2024. The Effect of Sampling Temperature on Problem Solving in Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 7346–7356. DOI: 10.18653/v1/2024.findings-emnlp.432

Robson Santos, Italo Santos, Cleyton Magalhaes, and Ronnie de Souza Santos. 2024. Are We Testing or Being Tested? Exploring the Practical Applications of Large Language Models in Software Testing . In 2024 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE Computer Society, Los Alamitos, CA, USA, 353–360. DOI: 10.1109/ICST60714.2024.00039

Dhruv Seth, Harshavardhan Nerella, Madhavi Najana, and Ayisha Tabbassum. 2024. Navigating the Multi-Cloud Maze: Benefits, Challenges, and Future Trends. International Journal of Global Innovations and Solutions (IJGIS) (jun 10 2024). [link].

Tomasz Szandała. 2025. AIOps for Reliability: Evaluating Large Language Models for Automated Root Cause Analysis in Chaos Engineering. In Computational Science – ICCS 2025 Workshops, Maciej Paszynski, Amanda S. Barnard, and Yongjie Jessica Zhang (Eds.). Springer Nature Switzerland, Cham, 323–336.

Anfu Tang, Laure Soulier, and Vincent Guigue. 2025. Clarifying Ambiguities: on the Role of Ambiguity Types in Prompting Methods for Clarification Generation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy) (SIGIR ’25). Association for Computing Machinery, New York, NY, USA, 20–30. DOI: 10.1145/3726302.3729922

Liang Zhang, Katherine Jijo, Spurthi Setty, Eden Chung, Fatima Javid, Natan Vidra, and Tommy Clifford. 2024. Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately. arXiv:2402.01722 [cs.CL] [link]

Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang,Weisong Sun, Shengcheng Yu, and Zhenyu Chen. 2024. A Survey on Large Language Models for Software Engineering. arXiv:2312.15223 [cs.SE] [link]

Tianyi Zhang, Shidong Pan, Zejun Zhang, Zhenchang Xing, and Xiaoyu Sun. 2025. Deployability-Centric Infrastructure-as-Code Generation: An LLM-based Iterative Framework. arXiv:2506.05623 [cs.SE] [link]

Jiawei Zheng, Hanghai Hong, Feiyan Liu, Xiaoli Wang, Jingsong Su, Yonggui Liang, and Shikai Wu. 2024. Fine-tuning Large Language Models for Domainspecific Machine Translation. arXiv:2402.15061 [cs.CL] [link]

Álvaro Barbero Jiménez. 2024. An evaluation of LLM code generation capabilities through graded exercises. arXiv:2410.16292 [cs.SE] [link]
Publicado
10/11/2025
PAULO, Weslley; VASCONCELOS, Breno; FERRAZ, Carlos. Automating Cloud Infrastructure Provisioning with Semantically-Enriched Large Language Models. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 77-85. DOI: https://doi.org/10.5753/webmedia.2025.16102.