Systematic choice of video game benchmarks in Deep Reinforcement Learning


Deep Reinforcement Learning has gained much attention due to results obtained by its methods to problems of high dimensionality, which were previously intractable or difficult to solve. In this context, video games have been widely used as experimental environments and benchmarks for the evaluation of reinforcement learning algorithms, as well as guiding the development of new methods. Although a lot has been done in Deep Reinforcement Learning since the proposal of its seminal work, little has been discussed about proper methodologies for constructing such evaluation benchmarks. This paper proposes to systematize the choice of video games to be used as a benchmark guaranteeing representativeness and diversity of learning environments based on the use of video game typologies proposed in the area of Game Design Research.

Palavras-chave: enemy generation, procedural content generation, video games, parallel evolutionary algorithm


V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015. [Online]. Available:

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller, "Playing atari with deep reinforcement learning," CoRR, vol. abs/1312.5602, 2013. [Online]. Available:

R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed. Cambridge, MA, USA: MIT Press, 1998.

H. Hasselt, "Double q-learning," in Advances in Neural Information Processing Systems, J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Eds., vol. 23. Curran Associates, Inc., 2010, pp. 2613-2621.

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," 2016.

Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, "Dueling network architectures for deep reinforcement learning," 2016.

W. Dabney, M. Rowland, M. G. Bellemare, and R. Munos, "Distributional reinforcement learning with quantile regression," 2017.

M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, "Noisy networks for exploration," 2019.

E. Aarseth, S. M. Smedstad, and L. Sunnana, "A multidimensional typology of games," in DiGRA Conference, 2003.

C. Elverdam and E. Aarseth, "Game classification and game design: Construction through critical analysis," Games and Culture, vol. 2, no. 1, pp. 3-22, 2007. [Online]. Available:

M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. G. Azar, and D. Silver, "Rainbow: Combining improvements in deep reinforcement learning," CoRR, vol. abs/1710.02298, 2017. [Online]. Available:

P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., ser. Probability & Mathematical Statistics. John Wiley & Sons Inc, 1971.

W. Conover, Practical nonparametric statistics, 3rd ed., ser. Wiley series in probability and statistics 1999: 9. Wiley, 1999.
GOMES, Élvio; SOUZA, Marlo. Systematic choice of video game benchmarks in Deep Reinforcement Learning. In: SIMPÓSIO BRASILEIRO DE JOGOS E ENTRETENIMENTO DIGITAL (SBGAMES), 20. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 162-171.