Gym Hero: A Research Environment for Reinforcement Learning Agents in Rhythm Games

Rômulo Freire Férrer Filho; Yuri Lenon Barbosa Nogueira; Creto Augusto Vidal; Joaquim Bento Cavalcante-Neto; Paulo Bruno de Sousa Serafim

Rômulo Freire Férrer Filho Universidade Federal do Ceará (UFC) https://orcid.org/0000-0001-5084-4867
Yuri Lenon Barbosa Nogueira Universidade Federal do Ceará (UFC) https://orcid.org/0000-0003-4778-8133
Creto Augusto Vidal Universidade Federal do Ceará (UFC) https://orcid.org/0000-0002-0857-6333
Joaquim Bento Cavalcante-Neto Universidade Federal do Ceará (UFC) https://orcid.org/0000-0003-1143-6662
Paulo Bruno de Sousa Serafim Instituto Atlântico https://orcid.org/0000-0002-5980-8149

Resumo

This work presents a Reinforcement Learning environment, called Gym Hero, based on the game Guitar Hero. It consists of a similar game implementation, developed using the graphics engine PyGame, with four difficulty levels, and able to randomly generate tracks. On top of the game, we implemented a Gym environment to train and evaluate Reinforcement Learning agents. In order to assess the environment's capacity as a suitable learning tool, we ran a set of experiments to train three autonomous agents using Deep Reinforcement Learning. Each agent was trained on a different level using Deep Q-Networks, a technique that combines Reinforcement Learning with Deep Neural Networks. The input of the network is only the pixels of the screen. We show that the agents were capable of learning the expected behaviors to play the game. The obtained results validate the proposed environment as capable of evaluating autonomous agents on Reinforcement Learning tasks.

Palavras-chave: autonomous agents, reinforcement learning, deep learning, reinforcement learning environments, rhythm games, guitar hero

Referências

K. Müller, J. Schaeffer, and V. Kramnik, "Man Vs. Machine: Challenging Human Supremacy at Chess." Russell Enterprises, Incorporated, 2018

M. Campbell, A. Hoane, and F. hsiung Hsu, “Deep blue,”ArtificialIntelligence, vol. 134, no. 1, pp. 57–83, 2002

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang,A. Guez, T. Hubert, L. Baker, M. Lai, A. Boltonet al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik,J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgievet al.,“Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,D. Wierstra, and M. A. Riedmiller, “Playing atari with deep reinforcement learning,” ArXiv, vol. abs/1312.5602, 2013.

V. Mnih, K. Kavukcuoglu, D. Silver, A. a. Rusu, J. Veness, M. G.Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,no. 7553, pp. 436–444, 2015.

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed., The MIT Press, 2018.

T. Thorsen, "Guitar hero tops $2 billion, activision blizzard earns $981 million in q1. Gamespot.com [link]. (accesed Feb. 8, 2021)

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman,J. Tang, and W. Zaremba, “OpenAI Gym,” ArXiv, vol. abs/1606.01540, 2016.

L. A. A. O. Vaz and M. Lamar, “Grybot: A didactic Guitar Hero robot player on FPGA,” 2015 14th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), pp. 118–127, 2015.

P. Ganapati, “Guitar hero robot plays videogame with electronicprecision,” Nov 2007. Wired.com https://www.wired.com/2008/11/guitar-hero-rob/ (accesed Feb. 8, 2021)

R. Mizrahi and T. Chalozin, “Guitar heronoid,” Mar 2007. Guitarheronoid Blog http://guitarheronoid.blogspot.com (accesed Feb. 8, 2021)

C. R. Breingan and P. Currier, “Development of a Rock band robot,” in 2012 Proceedings of IEEE Southeastcon, IEEE, pp. 1–4. 2012.

E. Adams, Fundamentals of Game Design, 2nd ed. USA: New Riders Publishing, 2009.

Tarragon, “Guitar hero file formats,” Dec 2008. Scorehero.com https://wiki.scorehero.com/GuitarHeroFileFormats (accesed Feb. 5, 2021)

M. Perry, “Guitar hero midi and vgs file details v1.3,” Oct 2008. Scorehero.com https://wiki.scorehero.com/GHFileFormatDetails (accesed Feb. 5, 2021)

mikex5, “A brief history of custom guitar hero chart formats and compatibility.", Reddit.com [link]. (accesed Feb. 5, 2021)

C. J. Watkins and P. Dayan, “Q-learning”, Machine learning, vol. 8, no.3-4, pp. 279–292, 1992.

C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D.dissertation, King’s College, Cambridge, UK, May 1989. [Online]. Available: http://www.cs.rhul.ac.uk/∼chrisw/newthesis.pdf

M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Ja ́skowski,“Vizdoom: A Doom-based AI research platform for visual reinforcement learning,” in 2016 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, pp. 1–8, 2016.

A. Juliani, V.-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar et al.,“ Unity: A general platform for intelligent agents,” ArXiv, vol. abs/1809.02627, 2018.

P. B. S. Serafim, Y. L. B. Nogueira, J. B. Cavalcante-Neto, and C. A. Vidal, “Deep reinforcement learning em ambientes virtuais,” in Introdução a Realidade Virtual e Aumentada, 3rd ed., R. Tori, M. S.Hounsell, C. G. Corrêa, and E. P. S. Nunes, Eds. Sociedade Brasileirade Computac ̧ ̃ao - SBC, 2020, ch. 20, pp. 423–436

P. Shinners, “Pygame,” http://pygame.org/, 2000.

H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, “Digital selection and analogue amplification coexist in acortex-inspired silicon circuit,” Nature, vol. 405, no. 6789, pp. 947–951, 2000.

X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feed forward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, Y. W. Teh and M. Titter-ington, Eds., vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, 13–15 May 2010, pp. 249–256.

L.-J. Lin, “Reinforcement learning for robots using neural networks,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, USA,1993.

P. B. S. Serafim, Y. L. B. Nogueira, C. A. Vidal, J. B. Cavalcante-Neto, and R. F. Ferrer Filho, “Investigating deep q-network agent sensibility to texture changes on FPS games,” in Proceedings of the XIX BrazilianSymposium on Computer Games and Digital Entertainment (SBGames), 2020, pp. 1–9.

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculumlearning,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.