I Choose You, Reinforcement Learning! Trained RL Agents For Pokémon Battles

Leonardo de Lellis Rossi; Bruno Souza; Maurício Pereira Lopes; Ricardo Ribeiro Gudwin; Esther Luna Colombini

doi:10.5753/sbgames_estendido.2024.241242

Leonardo de Lellis Rossi FEEC / Unicamp / H.IAAC
Bruno Souza Unicamp / Recod.ai
Maurício Pereira Lopes Unicamp
Ricardo Ribeiro Gudwin H.IAAC / FEEC / Unicamp
Esther Luna Colombini Unicamp / H.IAAC

DOI: https://doi.org/10.5753/sbgames_estendido.2024.241242

Resumo

Pokémon battles present a valuable training environment for Reinforcement Learning (RL) agents due to their inherent stochastic nature and adaptability to deterministic settings. However, this environment currently lacks a comprehensive benchmark of basic RL agent implementations suitable for training purposes. This project aims to fill this gap with an open-source benchmark of trained agents with classic RL methods and Deep Reinforcement Learning (DRL) techniques, to foster the development in the field and facilitate the entry of new researchers. We also propose a Markov Decision Process (MDP) environment, in which agents are trained and validated. The agents demonstrated effective learning and achieved robust performance during training.

Palavras-chave: Reinforcement Learning, Deep Learning, Tabular Methods, Pokémon, Benchmark

Referências

Arulkumaran, K., Deisenroth, M. P., Brundage, M., and Bharath, A. A. (2017). Deep Reinforcement Learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38.

Huang, D. and Lee, S. (2019). A self-play policy optimization approach to battling Pokémon. In Proceedings of the 2019 IEEE Conference on Games (CoG), pages 1–4.

Kalose, A., Kaya, K., and Kim, A. (2018). Optimal Battle Strategy in Pokémon using Reinforcement Learning. Stanford University.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with Deep Reinforcement Learning. DeepMind Technologies.

Nintendo. (2024). Pokémon official site. [link]

Osband, I., Blundell, C., Pritzel, A., and Van Roy, B. (2016). Deep exploration via bootstrapped DQN. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.

Rill-Garcıa, R. (2018). Reinforcement Learning for a Turn-Based Small Scale Attrition Game.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization algorithms.

Simoes, D., Reis, S., Lau, N., and Reis, L. P. (2020). Competitive Deep Reinforcement Learning over a Pokémon battling simulator. In 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pages 40–45. IEEE.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An introduction. MIT press, Cambridge, MA.

Watkins, C. (1989). Learning From Delayed Rewards. British Library. Thesis (Ph.D.).

Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist Reinforcement Learning. Machine learning, 8:229–256.

Zhang, J., Kim, J., O’Donoghue, B., and Boyd, S. (2021). Sample efficient Reinforcement Learning with REINFORCE. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 10887–10895.