Eficácia da Otimização de Parâmetros do Proximal Policy Optimization para Agentes em um Jogo Digital: Um Estudo Comparativo

Cristhian S. Minoves; André R. da Cruz

doi:10.5753/wsis.2025.15254

Cristhian S. Minoves CEFET-MG
André R. da Cruz CEFET-MG

DOI: https://doi.org/10.5753/wsis.2025.15254

Resumo

O papel dos personagens autônomos não humanos tem se tornado fundamental com a crescente demanda por ambientes imersivos em jogos digitais. No entanto, a otimização da configuração desses agentes inteligentes apresenta um desafio complexo para os desenvolvedores, dada a intrínseca natureza de seus modelos e o vasto número de parâmetros envolvidos. Este trabalho se propõe a comparar a eficácia de duas heurísticas de sintonia de parâmetros: uma baseada em otimização Bayesiana via Processos Gaussianos e outra em um procedimento de corrida iterado utilizando o método iRace. Ambas as heurísticas foram aplicadas à sintonia de parâmetros da técnica Proximal Policy Optimization (PPO), uma abordagem baseada em redes neurais, visando o treinamento de um agente para jogar Push The Block. Para isso, um experimento computacional foi conduzido. Após a sintonização, os conjuntos de parâmetros otimizados, juntamente com a configuração padrão, foram testados em um horizonte de tempo estendido. Os resultados obtidos indicaram que a sintonia realizada pelo iRace superou as demais abordagens, fornecendo um conjunto de parâmetros que aprimorou significativamente a eficácia do agente.

Palavras-chave: Sintonia de parâmetros, Jogos Digitais, Proximal Policy Optimization

Referências

Adil, K., Jiang, F., Liu, S., Grigorev, A., Gupta, B., and Rho, S. (2017). Training an agent for fps doom game using visual reinforcement learning and vizdoom. International Journal of Advanced Computer Science and Applications, 8(12).

Bardenet, R., Brendel, M., Kégl, B., and Sebag, M. (2013). Collaborative hyperparameter tuning. In International conference on machine learning, pages 199–207. PMLR.

Derrac, J., García, S., Molina, D., and Herrera, F. (2011). A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1(1):3–18.

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al. (2018). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.

Hansen, N., Müller, S. D., and Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18.

Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., and Lange, D. (2020). Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627.

Kishimoto, A. (2004). Inteligência artificial em jogos eletrônicos. Academic research about Artificial Intelligence for games.

Lai, J., Chen, X.-l., and Zhang, X.-Z. (2019). Training an agent for third-person shooter game using unity ml-agents. In International Conference on Artificial Intelligence and Computing Science. Hangzhou, pages 317–332.

Lanham, M. (2018). Learn Unity ML-Agents–Fundamentals of Unity Machine Learning: Incorporate new powerful ML algorithms such as Deep Reinforcement Learning for games. Packt Publishing Ltd.

Liu, Z., Chai, J., Zhu, X., Tang, S., Ye, R., Zhang, B., Bai, L., and Chen, S. (2025). Ml-agent: Reinforcing llm agents for autonomous machine learning engineering. arXiv preprint arXiv:2505.23723.

López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L. P., Birattari, M., and Stützle, T. (2016). The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives, 3:43–58.

Lucas, S. M., Liu, J., Bravi, I., Gaina, R. D., Woodward, J., Volz, V., and Perez-Liebana, D. (2019). Efficient evolutionary methods for game agent optimisation: Model-based is best. arXiv preprint arXiv:1901.00723.

Patel, P. G., Carver, N., and Rahimi, S. (2011). Tuning computer gaming agents using q-learning. In 2011 Federated Conference on Computer Science and Information Systems (FedCSIS), pages 581–588. IEEE.

Pellicer, L. F. A. O. (2020). Otimização de hiperparâmetros de modelos machine learning com BarySearch. PhD thesis, Universidade de São Paulo.

Probst, P., Wright, M. N., and Boulesteix, A.-L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3):e1301.

Rana, S., Li, C., Gupta, S., Nguyen, V., and Venkatesh, S. (2017). High dimensional bayesian optimization with elastic gaussian process. In International conference on machine learning, pages 2883–2891. PMLR.

Roa, J., Gutiérrez, M., and Stegmayer, G. (2008). Faia: Framework para la enseñanza de agentes en ia. IE Comunicaciones: Revista Iberoamericana de Informática Educativa, (8):43–56.

Savid, Y., Mahmoudi, R., Maskeliūnas, R., and Damaševičius, R. (2023). Simulated autonomous driving using reinforcement learning: A comparative study on unity’s ml-agents framework. Information, 14(5):290.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Toal, D. J., Bressloff, N. W., and Keane, A. J. (2008). Kriging hyperparameter tuning strategies. AIAA journal, 46(5):1240–1252.

Unity ML-Agents (2024). Training with proximal policy optimization. [link]. Online; acessado em 10/06/2025.

Wang, X., Jin, Y., Schmitt, S., and Olhofer, M. (2023). Recent advances in bayesian optimization. ACM Computing Surveys, 55(13s):1–36.

Zhuang, Z., Lei, K., Liu, J., Wang, D., and Guo, Y. (2023). Behavior proximal policy optimization. arXiv preprint arXiv:2302.11312.