Investigando a Confiabilidade e Efetividade de LLMs em Otimização de Desempenho: Um Estudo Exploratório com Dois Servidores Web

Pedro Jardelino Neto; Nabor C. Mendonça

doi:10.5753/wperformance.2026.23158

Pedro Jardelino Neto UNIFOR
Nabor C. Mendonça UNIFOR

DOI: https://doi.org/10.5753/wperformance.2026.23158

Resumo

Este artigo avalia a confiabilidade operacional e a efetividade de otimização de quatro LLMs de ponta atuando como otimizadores de servidores web em ambiente experimental controlado. Em 48 ensaios, 66,7% das respostas preservaram a operacionalidade do sistema, mas apenas 43,8% produziram otimizações efetivas, enquanto 33,3% danificaram o servidor-alvo. Entre os modelos, o Gemini 3 Pro obteve o melhor desempenho global (75,0%) e o DeepSeek-V3.2, o pior (16,7%). O efeito mais forte, contudo, foi o do servidor-alvo: o Apache atingiu 66,7% de sucesso, contra 20,8% no Nginx (p = 0,003). Em conjunto, os resultados indicam que, mesmo em condições favoráveis, o uso de LLMs na otimização de servidores web ainda combina efetividade limitada com risco operacional relevante.

Referências

Beyer, B., Jones, C., Petoff, J., and Murphy, N. R. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media, Sebastopol, CA.

Grafana Labs (2025). k6. Acessado em: 24 de outubro 2025.

He, X., Liu, Q., Du, M., Yan, L., Fan, Z., Huang, Y., Yuan, Z., and Ma, Z. (2025). Sweperf: Can language models optimize code performance on real-world repositories? arXiv preprint arXiv:2507.12415.

HighScalability (2013). The secret to 10 million concurrent connections: The kernel is the problem, not the solution. Acessado em: 24 de outubro 2025.

Kegel, D. (2003). The c10k problem. Acessado em: 24 de outubro 2025.

Patwardhan, T. et al. (2025). Gdpval: Evaluating ai model performance on real-world economically valuable tasks. arXiv preprint arXiv:2510.04374.

Spieker, H. et al. (2025). Prompting for performance: Exploring llms for configuring software. In 2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI), pages 114–121. IEEE.

Vitui, A. and Chen, T.-H. (2025). Empowering aiops: Leveraging large language models for it operations management. arXiv preprint arXiv:2501.12461.

W3Techs (2025). Usage statistics and market shares of web servers. Acessado em: 24 de outubro 2025.

White, C. et al. (2025). Livebench: A challenging, contamination-free LLM benchmark. In The Thirteenth International Conference on Learning Representations.

Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., and Chen, X. (2023). Large language models as optimizers. In The Twelfth International Conference on Learning Representations.

Yang, Z., Bhatnagar, A., Qiu, Y., Miao, T., Tser Jern Kon, P., Xiao, Y., Huang, Y., Casado, M., and Chen, A. (2025). Cloud infrastructure management in the age of ai agents. ACM SIGOPS Operating Systems Review, 59(1):1–8.

Yi, L., Gay, G., and Leitner, P. (2025). An experimental study of real-life llm-proposed performance improvements. arXiv preprint arXiv:2510.15494.