Effectiveness of Parameter Optimization of Proximal Policy Optimization for Agents in a Digital Game: A Comparative Study
Abstract
The role of autonomous non-human characters has become crucial with the growing demand for immersive environments in digital games. However, optimizing the configuration of these intelligent agents presents a complex challenge for developers, given the intrinsic nature of their models and the vast number of parameters involved. This work aims to compare the effectiveness of two parameter tuning heuristics: one based on Bayesian optimization via Gaussian Processes and the other on an iterated race procedure using the iRace method. Both heuristics were applied to the parameter tuning of the Proximal Policy Optimization (PPO) technique, a neural network-based approach, aimed at training an agent to play Push the Block. To this end, a computational experiment was conducted. After tuning, the optimized parameter sets, along with the default configuration, were tested over an extended time horizon. The results indicated that the tuning performed by iRace outperformed the other approaches, providing a parameter set that significantly improved the agent’s effectiveness.
Keywords:
Parameter tuning, Digital Games, Proximal Policy Optimization
References
Adil, K., Jiang, F., Liu, S., Grigorev, A., Gupta, B., and Rho, S. (2017). Training an agent for fps doom game using visual reinforcement learning and vizdoom. International Journal of Advanced Computer Science and Applications, 8(12).
Bardenet, R., Brendel, M., Kégl, B., and Sebag, M. (2013). Collaborative hyperparameter tuning. In International conference on machine learning, pages 199–207. PMLR.
Derrac, J., García, S., Molina, D., and Herrera, F. (2011). A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1(1):3–18.
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al. (2018). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
Hansen, N., Müller, S. D., and Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18.
Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., and Lange, D. (2020). Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627.
Kishimoto, A. (2004). Inteligência artificial em jogos eletrônicos. Academic research about Artificial Intelligence for games.
Lai, J., Chen, X.-l., and Zhang, X.-Z. (2019). Training an agent for third-person shooter game using unity ml-agents. In International Conference on Artificial Intelligence and Computing Science. Hangzhou, pages 317–332.
Lanham, M. (2018). Learn Unity ML-Agents–Fundamentals of Unity Machine Learning: Incorporate new powerful ML algorithms such as Deep Reinforcement Learning for games. Packt Publishing Ltd.
Liu, Z., Chai, J., Zhu, X., Tang, S., Ye, R., Zhang, B., Bai, L., and Chen, S. (2025). Ml-agent: Reinforcing llm agents for autonomous machine learning engineering. arXiv preprint arXiv:2505.23723.
López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L. P., Birattari, M., and Stützle, T. (2016). The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives, 3:43–58.
Lucas, S. M., Liu, J., Bravi, I., Gaina, R. D., Woodward, J., Volz, V., and Perez-Liebana, D. (2019). Efficient evolutionary methods for game agent optimisation: Model-based is best. arXiv preprint arXiv:1901.00723.
Patel, P. G., Carver, N., and Rahimi, S. (2011). Tuning computer gaming agents using q-learning. In 2011 Federated Conference on Computer Science and Information Systems (FedCSIS), pages 581–588. IEEE.
Pellicer, L. F. A. O. (2020). Otimização de hiperparâmetros de modelos machine learning com BarySearch. PhD thesis, Universidade de São Paulo.
Probst, P., Wright, M. N., and Boulesteix, A.-L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3):e1301.
Rana, S., Li, C., Gupta, S., Nguyen, V., and Venkatesh, S. (2017). High dimensional bayesian optimization with elastic gaussian process. In International conference on machine learning, pages 2883–2891. PMLR.
Roa, J., Gutiérrez, M., and Stegmayer, G. (2008). Faia: Framework para la enseñanza de agentes en ia. IE Comunicaciones: Revista Iberoamericana de Informática Educativa, (8):43–56.
Savid, Y., Mahmoudi, R., Maskeliūnas, R., and Damaševičius, R. (2023). Simulated autonomous driving using reinforcement learning: A comparative study on unity’s ml-agents framework. Information, 14(5):290.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Toal, D. J., Bressloff, N. W., and Keane, A. J. (2008). Kriging hyperparameter tuning strategies. AIAA journal, 46(5):1240–1252.
Unity ML-Agents (2024). Training with proximal policy optimization. [link]. Online; acessado em 10/06/2025.
Wang, X., Jin, Y., Schmitt, S., and Olhofer, M. (2023). Recent advances in bayesian optimization. ACM Computing Surveys, 55(13s):1–36.
Zhuang, Z., Lei, K., Liu, J., Wang, D., and Guo, Y. (2023). Behavior proximal policy optimization. arXiv preprint arXiv:2302.11312.
Bardenet, R., Brendel, M., Kégl, B., and Sebag, M. (2013). Collaborative hyperparameter tuning. In International conference on machine learning, pages 199–207. PMLR.
Derrac, J., García, S., Molina, D., and Herrera, F. (2011). A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1(1):3–18.
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al. (2018). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
Hansen, N., Müller, S. D., and Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18.
Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., and Lange, D. (2020). Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627.
Kishimoto, A. (2004). Inteligência artificial em jogos eletrônicos. Academic research about Artificial Intelligence for games.
Lai, J., Chen, X.-l., and Zhang, X.-Z. (2019). Training an agent for third-person shooter game using unity ml-agents. In International Conference on Artificial Intelligence and Computing Science. Hangzhou, pages 317–332.
Lanham, M. (2018). Learn Unity ML-Agents–Fundamentals of Unity Machine Learning: Incorporate new powerful ML algorithms such as Deep Reinforcement Learning for games. Packt Publishing Ltd.
Liu, Z., Chai, J., Zhu, X., Tang, S., Ye, R., Zhang, B., Bai, L., and Chen, S. (2025). Ml-agent: Reinforcing llm agents for autonomous machine learning engineering. arXiv preprint arXiv:2505.23723.
López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L. P., Birattari, M., and Stützle, T. (2016). The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives, 3:43–58.
Lucas, S. M., Liu, J., Bravi, I., Gaina, R. D., Woodward, J., Volz, V., and Perez-Liebana, D. (2019). Efficient evolutionary methods for game agent optimisation: Model-based is best. arXiv preprint arXiv:1901.00723.
Patel, P. G., Carver, N., and Rahimi, S. (2011). Tuning computer gaming agents using q-learning. In 2011 Federated Conference on Computer Science and Information Systems (FedCSIS), pages 581–588. IEEE.
Pellicer, L. F. A. O. (2020). Otimização de hiperparâmetros de modelos machine learning com BarySearch. PhD thesis, Universidade de São Paulo.
Probst, P., Wright, M. N., and Boulesteix, A.-L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3):e1301.
Rana, S., Li, C., Gupta, S., Nguyen, V., and Venkatesh, S. (2017). High dimensional bayesian optimization with elastic gaussian process. In International conference on machine learning, pages 2883–2891. PMLR.
Roa, J., Gutiérrez, M., and Stegmayer, G. (2008). Faia: Framework para la enseñanza de agentes en ia. IE Comunicaciones: Revista Iberoamericana de Informática Educativa, (8):43–56.
Savid, Y., Mahmoudi, R., Maskeliūnas, R., and Damaševičius, R. (2023). Simulated autonomous driving using reinforcement learning: A comparative study on unity’s ml-agents framework. Information, 14(5):290.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Toal, D. J., Bressloff, N. W., and Keane, A. J. (2008). Kriging hyperparameter tuning strategies. AIAA journal, 46(5):1240–1252.
Unity ML-Agents (2024). Training with proximal policy optimization. [link]. Online; acessado em 10/06/2025.
Wang, X., Jin, Y., Schmitt, S., and Olhofer, M. (2023). Recent advances in bayesian optimization. ACM Computing Surveys, 55(13s):1–36.
Zhuang, Z., Lei, K., Liu, J., Wang, D., and Guo, Y. (2023). Behavior proximal policy optimization. arXiv preprint arXiv:2302.11312.
Published
2025-09-17
How to Cite
MINOVES, Cristhian S.; CRUZ, André R. da.
Effectiveness of Parameter Optimization of Proximal Policy Optimization for Agents in a Digital Game: A Comparative Study. In: WORKSHOP ON INFORMATION SYSTEMS (WSIS), 16. , 2025, Rio Paranaíba/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 135-144.
DOI: https://doi.org/10.5753/wsis.2025.15254.
