Assessing the Robustness of Deep Q-Network Agents to Changes on Game Object Textures
Resumo
The research in autonomous agents aspires to achieve Artificial General Intelligence, where agents, like humans, are able to understand concepts and learn how to solve tasks. We would like to observe this ability on game agents as well. Recent research on autonomous agents for game playing uses a combination of Deep Neural Networks and Reinforcement Learning algorithms. Commonly, Neural Networks present vision-based models, usually Convolutional Neural Networks (CNN). However, those models can undergo performance degradation when dealing with different pixel patterns, an issue that also happens with vision-based autonomous agents in games. Prior works have shown that CNN-based autonomous agents cannot reproduce the behavior learned in one scene when they are placed into a brand new version with different textures. In this work, we evaluate whether the agents educe high-level elements, such as enemy, foreground, and background. Instead of testing the agent in a completely different scene, we designed two experiments based on slight changes. In the first experiment, we change only a subset of the game objects. In the second experiment, the agents play in an interpolated version of two scenes. Even when changing only a single game object texture, the agents are not guaranteed to present good behavior. We show that, depending on the training scenario, the agents are not fully robust to generalize a high-level concept of game objects.
Referências
V. Mnih, K. Kavukcuoglu, D. Silver, A. a. Rusu, J. Veness, M. G.Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran,D. Wierstra, S. Legg, and D. Hassabis, “Human-level control throughdeep reinforcement learning,”Nature, vol. 518, no. 7540, pp. 529–533,2015.
A. P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, D. Guo, and C. Blundell, “Agent57: Outperforming the atari humanbenchmark,” ArXiv e-prints, pp. 1–30, 2020. [Online]. Available: https://arxiv.org/abs/2003.13350
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, pp. 484–503, 2016.
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez,M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with ageneral reinforcement learning algorithm,” ArXiv e-prints, pp. 1–19,2017. [Online]. Available: http://arxiv.org/abs/1712.01815
G. Lample and D. S. Chaplot, “Playing fps games with deep reinforce-ment learning,” in Proceedings of the Thirty-First AAAI Conference onArtificial Intelligence, ser. AAAI’17. AAAI Press, 2017, p. 2140–2146.
C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. de Oliveira Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang, “Dota 2 with large scale deepreinforcement learning,” ArXiv e-prints, pp. 1–66, 2019. [Online].Available: https://arxiv.org/abs/1912.06680
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P.Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmasterlevel in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, Nov 2019.
J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deepneural networks,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 5, p. 828–841, Oct 2019.
P. B. S. Serafim, Y. L. B. Nogueira, C. A. Vidal, J. B. Cavalcante-Neto, and R. F. Férrer Filho, “Investigating deep q-network agent sensibility to texture changes on FPS games,” in Proceedings of the XIX Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), 2020, pp. 311–319.
M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaskowski, “Vizdoom: A doom-based AI research platform for visual reinforcement learning,” ArXiv e-prints, pp. 1–8, 2016. [Online]. Available: http://arxiv.org/abs/1605.02097
D. S. Chaplot, G. Lample, K. M. Sathyendra, and R. Salakhutdinov, “Transfer deep reinforcement learning in 3d environments: An empirical study,” in 30th Conference on Neural Information Processing Systems (NIPS), 2016, pp. 1–9.
R. Dubey, P. Agrawal, D. Pathak, T. L. Griffiths, and A. A. Efros, “Investigating human priors for playing video games,” in Proceedings of the 35th International Conference on Machine Learning (ICML), 2018, pp. 1–9.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. The MIT Press, 2018.
C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D.dissertation, King’s College, Cambridge, UK, May 1989.
G. Tesauro, “Temporal difference learning and td-gammon,” Commun. ACM, vol. 38, no. 3, p. 58–68, 1995. [Online]. Available: https://doi.org/10.1145/203330.203343
L. J. Lin, “Reinforcement learning for robots using neural networks,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, USA, 1993, uMI Order No. GAX93-22750.
R. H. R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, “Digital selection and analogue amplification coexistin a cortex-inspired silicon circuit,” Nature, vol. 405, pp. 947, Jun 2000.
X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, G. Gordon, D. Dunson, and M. Dudík, Eds., vol. 15. Fort Lauderdale, FL, USA: PMLR, 11–13 Apr 2011, pp. 315–323.
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, ser. Machine Learning Research, Y. W. Teh and M. Titterington, Eds., vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, 13–15 May 2010, pp. 249–256.
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015,pp. 1–15.