Uma Análise Comparativa entre LLMs Proprietários e de Pesos Abertos na Injeção de Falhas de Nuvens Privadas

Guilherme Silva Duarte; Erica Teixeira Gomes de Sousa; Carlos Manoel Nunes e Silva

doi:10.5753/wtf.2026.22985

Guilherme Silva Duarte UFRPE
Erica Teixeira Gomes de Sousa UFRPE
Carlos Manoel Nunes e Silva UFRPE

DOI: https://doi.org/10.5753/wtf.2026.22985

Resumo

A expansão da nuvem torna a avaliação de dependabilidade vital para mitigar falhas. Sendo a injeção de falhas de software uma técnica utilizada para essa finalidade, este trabalho busca democratizar sua aplicação ao comparar a eficácia dos modelos Gemini-2.5-flash e GPT-OSS-120b na injeção de falhas em ambientes de nuvens privadas. Os resultados indicam que o modelo proprietário (Gemini-2.5-flash) obteve 90% de sucesso, enquanto o modelo aberto (GPT-OSS-120b) apresentou erros na referência espacial. Um achado importante foi a detecção de “Falhas Cinzentas” (Gray Failures) geradas por IA, em que serviços permanecem ativos no monitoramento, mas funcionalmente inoperantes, impactando a observabilidade tradicional.

Referências

Avizienis, A., Laprie, J.-C., Randell, B., and Landwehr, C. (2004). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1):11–33.

Basiri, A., Behnam, N., de Graaff, R., Hochstein, L., Kosewski, L., Reynolds, J., and Rosenthal, C. (2016). Chaos engineering. IEEE Software, 33(3):35–41.

Boukhlif, M., Hanine, A., Khoukhi, E., and Arsalane, M. (2024). Natural language processing-based software testing: A systematic literature review. IEEE Access, 12:79383–79400.

Chen, H., Dou, W., Wang, D., and Qin, F. (2024). MicroFI: Non-intrusive and prioritized request-level fault injection for microservice applications. IEEE Transactions on Dependable and Secure Computing, 21(5):4921–4938.

Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.

Cloud Security Alliance (2024). Top threats to cloud computing: Egregious eleven.

Cotroneo, D., De Simone, L., Liguori, P., Natella, R., and Bidokhti, N. (2019). How bad can a bug get? an empirical analysis of software failures in the OpenStack cloud computing platform. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), pages 200–211.

Cotroneo, D., De Simone, L., and Natella, R. (2022). ThorFI: A novel approach for network fault injection as a service. Journal of Network and Computer Applications, 201:103334.

Cotroneo, D. and Liguori, P. (2024). Neural fault injection: Generating software faults from natural language. In 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks – Supplemental Volume (DSN-S), pages 23–27, Brisbane, Australia.

Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., and Zhang, J. M. (2023). Large language models for software engineering: Survey and open problems. In IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), pages 31–53, Melbourne, Australia.

Flexera (2024). 2024 state of the cloud report.

Improta, C., Liguori, P., Natella, R., Cukic, B., and Cotroneo, D. (2025). Quality in, quality out: Investigating training data’s role in AI code generation. In IEEE/ACM 33rd International Conference on Program Comprehension (ICPC), pages 454–465, Ottawa, ON, Canada.

Jia, Y. and Harman, M. (2011). An analysis and survey of the development of mutation testing. IEEE Transactions on Software Engineering, 37(5):649–678.

Ju, X., Soares, L., Shin, K. G., Ryu, K. D., and Da Silva, D. (2013). On fault resilience of OpenStack. In Proceedings of the 4th Annual Symposium on Cloud Computing (SoCC), pages 2:1–2:16.

Liguori, P., Improta, C., Natella, R., Cukic, B., and Cotroneo, D. (2024). Enhancing AI-based generation of software exploits with contextual information. In 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pages 180–191.

Meneely, A. and Williams, L. (2012). Interactive churn as a predictor of system-level vulnerability. In ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

Nagappan, N. and Ball, T. (2005). Use of relative code churn measures to predict system defect density. In ICSE ’05: Proceedings of the 27th International Conference on Software Engineering.

Natella, R., Cotroneo, D., and Madeira, H. S. (2013). On fault representativeness of software fault injection. IEEE Transactions on Software Engineering, 39(1):80–96.

Natella, R., Cotroneo, D., and Madeira, H. S. (2016). Assessing dependability with software fault injection: A survey. ACM Computing Surveys, 48(3):1–55.

Ojdanic, M., Barr, E. T., Grunske, L., and Papadakis, M. (2023). Syntactic versus semantic similarity of artificial and real faults in mutation testing studies. IEEE Transactions on Software Engineering, 49(7):3922–3938.

OpenInfra Foundation (2023). 2023 OpenInfra user survey.

OpenStack (2024). Open source cloud computing platform software – OpenStack. [link]. Acesso em: 12 fev. 2024.

Rasheed, M. A. (2021). Malware injection attacks in resource virtualization of cloud computing environment. VFAST Transactions on Software Engineering, 9(2):44–49.

Ruan, X. et al. (2023). Prompt learning for developing software exploits. In Proceedings of the 14th Asia-Pacific Symposium on Internetware (Internetware), pages 154–164.

Tabrizchi, H. and Kuchaki Rafsanjani, M. (2020). A survey on security challenges in cloud computing: Issues, threats, and solutions. Journal of Supercomputing, 76:9193–9232.