Centralized Critic per Knowledge for Cooperative Multi-Agent Game Environments

Thaís Ferreira; Esteban Clua; Troy Costa Kohwalter

Thaís Ferreira UFF
Esteban Clua UFF
Troy Costa Kohwalter UFF

Resumo

Cooperative multiplayer games are based on rules where players must collaborate to solve certain tasks. These games bring specific challenges when using Multi-Agent Reinforcement Learning (MARL), since they present requirements related to the training of these collaborative behaviors, such as partial observations, non-stationary, and the problem of credit assignment. One of the approaches used in MARL to solve these challenges is centralized training with decentralized execution. The idea is to use the available knowledge about the full state and information of the environment in the training phase, but policy learning takes place in a decentralized way, not depending on this knowledge. In this work, we study the approach of centralized training with decentralized execution. We seek to validate whether the division of knowledge about the environment (e.g. observations, perception of objects, enemies, obstacles) by different groups (different centralized critics) improves learning performance in multi-agent environments. Our results show that specifying a centralized critic per knowledge improves the training, but it also increases the time of the training process.

Palavras-chave: Cooperative Multi-Agents, Reinforcement Learning, ML-Agents, MA-POCA

Referências

D. Vidhate and P. Kulkarni, “Enhanced cooperative multi-agent learning algorithms (ecmla) using reinforcement learning,” in 2016 International Conference on Computing, Analytics and Security Trends (CAST), IEEE, Dec. 2016, pp. 556–561.

Q. Zhang, D. Zhao, and F. L. Lewis, “Model-free reinforcement learning for fully cooperative multi-agent graphical games,” in International Joint Conference on Neural Networks (IJCNN), pp. 1–6, IEEE, 2018.

O. Vinyals, I. Babuschkin, W. Czarnecki, and et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, pp. 350–354, 2019.

C. Berner et al., “Dota 2 with large scale deep reinforcement learning.” arXiv:1912.06680, 2019.

L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.

J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in AAAI Conference on Artificial Intelligence (AAAI), pp. 2974–2982, 2018.

Y. Chang, T. Ho, and L. Kaelbling, “All learning is local: Multi-agent learning in global reward games,” in Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2004, pp. 807–814.

J. Foerster, Y. M. Assael, N. de Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, 2016, pp. 2145–2153.

E. Jorge, M. Kageback, and E. Gustavsson, “Learning to play guess who? and inventing a grounded language as a consequence.” arXiv:1611.03218, 2016.

Unity. [link]. (accessed Jun 18, 2021).

S. Marsland, Machine Learning An Algorithmic Perspective. Boca Raton, FL: Chapman & Hall/CRC, 2015.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 2018.

V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in Neural Information Processing Systems, 2000. pp. 1008–1014.

M. Tan, Multi-agent reinforcement learning: Independent vs. cooperative agents, pp. 487–494. San Francisco, CA: Morgan Kaufmann Publishers Inc., 1997.

L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems,” Knowledge Engineering Review, vol. 27, no. 1, pp. 1–31, 2012.

P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS’18, (Richland, SC), International Foundation for Autonomous Agents and Multiagent Systems, 2016, pp. 2085–2087.

T. Rashid, M. Samvelyana, C. S. Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proceedings of the International Conference on Machine Learning, 2018, pp. 4292–4301.

R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments.” arXiv:1706.02275v4, 2020.

F. Christianos, L. Sch¨afer, and S. V. Albrecht, “Shared experience actorcritic for multi-agent reinforcement learning,” in Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), 2020, pp. 10707–10717.

A. Juliani, V.-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar, and D. Lange, “Unity: A general platform for intelligent agents.” arXiv:1809.02627v2, 2020.