Análise do uso de Modelos de Linguagem de Grande Escala na Geração de Códigos para Automação Residencial
Resumo
This paper presents a comparative experimental evaluation of Large Language Models (LLMs) for generating YAML code in home automation using the Home Assistant (HA) platform. Four models were tested: GPT-4o, GPT-4.5, Gemini 2.5 Flash, and Gemini 2.5 Pro. Tests covered three complexity levels and two prompt types. Performance was analyzed with a 2𝑘 factorial design and validated with ANOVA. Task complexity was the main factor affecting success, while prompt specificity reduced errors. All models succeeded in basic tasks, but only Gemini 2.5 Pro maintained functionality in complex scenarios. Error analysis showed that logic errors were most frequent, followed by syntax and entity errors. A local fine-tuned model (DeepSeek-r1:14B) was also evaluated. It worked for simpler tasks but required manual adjustments. The results highlight the importance of model robustness, prompt quality, and deployment context in LLM-based code generation for home automation.
Referências
Barbara Rita Barricelli, Daniela Fogli, Letizia Iemmolo, and Angela Locoro. 2022. A Multi-Modal Approach to Creating Routines for Smart Speakers. In Proceedings of the 2022 International Conference on Advanced Visual Interfaces (Frascati, Rome, Italy) (AVI ’22). Association for Computing Machinery, New York, NY, USA, Article 37, 5 pages. DOI: 10.1145/3531073.3531168
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
Shabnam FakhrHosseini, Chaiwoo Lee, Sheng-Hung Lee, and Joseph Coughlin. 2025. A taxonomy of home automation: expert perspectives on the future of smarter homes. Information Systems Frontiers 27, 2 (2025), 449–466.
Simone Gallo, Fabio Paternò, and Alessio Malizia. 2024. A conversational agent for creating automations exploiting large language models. Personal and Ubiquitous Computing 28, 6 (2024), 931–946.
Mathyas Giudici, Luca Padalino, Giovanni Paolino, Ilaria Paratici, Alexandru Ionut Pascu, and Franca Garzotto. 2024. Designing home automation routines using an LLM-based chatbot. Designs 8, 3 (2024), 43.
Google DeepMind. 2025. Gemini. [link]. Acessado em: jun. 2025.
Home Assistant. 2025. Conceitos e terminologia. [link]. Acessado em: jun. 2025.
Mrs M KALPANA. 2025. Survey and analysis of home automation system encompassing embedded systems, the Internet of Things (IoT) and AI algorithms. Vidhyayana-An International Multidisciplinary Peer-Reviewed E-Journal-ISSN 2454-8596 10, si4 (2025), 449–466.
Evan King, Haoxiang Yu, Sangsu Lee, and Christine Julien. 2024. Sasha: creative goal-oriented reasoning in smart homes with large language models. Proceedings of the ACM on Interactive, Mobile,Wearable and Ubiquitous Technologies 8, 1 (2024), 1–38.
Alberto Monge Roffarello and Luigi De Russis. 2023. Defining Trigger-Action Rules via Voice: A Novel Approach for End-User Development in the IoT. In End-User Development, Lucio Davide Spano, Albrecht Schmidt, Carmen Santoro, and Simone Stumpf (Eds.). Springer Nature Switzerland, Cham, 65–83.
Douglas C Montgomery. 2017. Design and analysis of experiments. John wiley & sons, Hoboken, NJ.
Daniel Moraes, Polyana da Costa, Antonio Busson, José Boaro, Carlos Neto, and Sergio Colcher. 2023. On the Challenges of Using Large Language Models for NCL Code Generation. In Anais Estendidos do XXIX Simpósio Brasileiro de Sistemas Multimídia e Web (Ribeirão Preto/SP). SBC, Porto Alegre, RS, Brasil, 151–156. DOI: 10.5753/webmedia_estendido.2023.236175
Ollama. 2025. DeepSeek-r1:14B. [link]. Acessado em: jun. 2025.
Ollama. 2025. Ollama. [link]. Acessado em: jun. 2025.
OpenAI. 2025. ChatGPT. [link]. Acessado em: jun. 2025.
Oracle. 2024. Oracle VM VirtualBox. [link]. Acessado em: jun. 2025.
Saurabh Pujar, Luca Buratti, Xiaojie Guo, Nicolas Dupuis, Burn Lewis, Sahil Suneja, Atin Sood, Ganesh Nalawade, Matthew Jones, Alessandro Morari, and Ruchir Puri. 2023. Automated Code generation for Information Technology Tasks in YAML through Large Language Models. arXiv:2305.02783 [cs.SE] [link]
Mohaimenul Azam Khan Raiaan, Md Saddam Hossain Mukta, Kaniz Fatema, Nur Mohammad Fahad, Sadman Sakib, Most Marufatul Jannat Mim, Jubaer Ahmad, Mohammed Eunus Ali, and Sami Azam. 2024. A review on large language models: Architectures, applications, taxonomies, open issues and challenges. IEEE Access 12 (2024), 26839–26874.
Xiaoyin Wang and Dakai Zhu. 2024. Validating LLM-Generated Programs with Metamorphic Prompt Testing. arXiv:2406.06864 [cs.SE] [link]
Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. Finetuned Language Models Are Zero-Shot Learners. arXiv:2109.01652 [cs.CL] [link]
Ziqi Yin, Mingxin Zhang, and Daisuke Kawahara. 2024. Harmony: A Home Agent for Responsive Management and Action Optimization with a Locally Deployed Large Language Model. arXiv:2410.14252 [cs.HC] [link]
