Towards Heterogeneous Multi-Agent Reinforcement Learning with Graph Neural Networks
Resumo
This work proposes a neural network architecture that learns policies for multiple agent classes in a heterogeneous multi-agent reinforcement setting. The proposed network uses directed labeled graph representations for states, encodes feature vectors of different sizes for different entity classes, uses relational graph convolution layers to model different communication channels between entity types and learns distinct policies for different agent classes, sharing parameters wherever possible. Results have shown that specializing the communication channels between entity classes is a promising step to achieve higher performance in environments composed of heterogeneous entities.
Referências
Agarwal, A., Kumar, S., and Sycara, K. (2019). Learning Transferable Cooperative Behavior in Multi-Agent Teams. In ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Representations.
Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A.,Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R.,Gulcehre, C., Song, F., Ballard, A., Gilmer, J., Dahl, G., Vaswani, A., Allen, K., Nash,C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals,O., Li, Y., and Pascanu, R. (2018). Relational inductive biases, deep learning, and graph networks. arXiv Bondy, J. A. and Murty, U. S. R. (2008). Graph Theory. Springer London.
Bowling, M. and Veloso, M. (2000). An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning. Resreport, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.
Busoniu, L., Babuska, R., and Schutter, B. D. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems,Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172.
da Silva, F. L., Glatt, R., and Costa, A. H. R. (2019). MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning. IEEE Transactions on Cybernetics, 49.
Das, A., Gervet, T., Romoff, J., Batra, D., Parikh, D., Rabbat, M., and Pineau, J. (2019). TarMAC: Targeted Multi-Agent Communication. Proceedings of the 36th International Conference on Machine Learning, 97:1538–1546.
Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., Gómez-Bombarelli, R., Hirzel, T., Aspuru-Guzik, A., and Adams, R. P. (2015). Convolutional Networks on Graphs for Learning Molecular Fingerprints.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E.(2017). Neural Message Passing for Quantum Chemistry. arXiv:1704.01212 [cs].
Gori, M., Monfardini, G., and Scarselli, F. (2005). A new model for learning in graph domains. In Proceedings of the International Joint Conference on Neural Networks, volume 2, pages 729–734. IEEE.
Guestrin, C., Koller, D., Gearhart, C., and Kanodia, N. (2003). Generalizing Plans to New Environments in Relational MDPs. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, pages 1003–1010, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Jiang, J., Dun, C., Huang, T., and Lu, Z. (2020). Graph Convolutional Reinforcement Learning. In International Conference on Learning Representations.
Jiang, J. and Lu, Z. (2018). Learning Attentional Communication for Multi-Agent Cooperation. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K.,
Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31, pages 7254–7264. Curran Associates, Inc.
Kipf, T. N. and Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings. International Conference on Learning Representations, ICLR.
Littman, M. L. (1994). Markov Games as a Framework for Multi-Agent Reinforcement Learning. In Proceedings of the Eleventh International Conference on Machine Learning, volume 157, pages 157–163.
Malysheva, A., Kudenko, D., and Shpilman, A. (2019). MAGNet:Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning. In Adaptive and Learning Agents Workshop at AAMAS (ALA 2019), Montreal, Canada.
Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., and Wang, J. (2017). Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-Level Coordination in Learning to Play StarCraft Combat Games.
Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar, G., Nardelli,N., Rudner, T. G. J., Hung, C.-M., Torr, P. H. S., Foerster, J., and Whiteson, S. (2019).The StarCraft Multi-Agent Challenge. arXiv:1902.04043 [cs, stat].
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2009a). Computational capabilities of graph neural networks. IEEE Transactions on Neural Networks, 20(1):81–102.
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2009b). The Graph Neural Network Model. IEEE Transactions on Neural Networks, 20(1):61–80.
Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015). Universal Value Function Approximators. In International Conference on Machine Learning,pages 1312–1320.
Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov,I., and Welling, M. (2018). Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pages 593–607. Springer.
Sukhbaatar, S., Szlam, A., and Fergus, R. (2016). Learning Multiagent Communication with Backpropagation. In Lee, D. D., Sugiyama, M., Luxburg,U. V., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems 29, pages 2244–2252. Curran Associates, Inc.
Veličković, P., Casanova, A., Liò, P., Cucurull, G., Romero, A., and Bengio, Y. (2018). Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. International Conference on Learning Representations, ICLR.
Wang, D., Duan, Y., and Weng, J. (2018a). Motivated Optimal Developmental Learning for Sequential Tasks Without Using Rigid Time-Discounts. IEEE Transactions on Neural Networks and Learning Systems, 29.
Wang, T., Liao, R., Ba, J., and Fidler, S. (2018b). Nervenet: Learning structured policy with graph neural networks. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. International Conference on Learning Representations, ICLR.
Wasser, C. G. D., Cohen, A., and Littman, M. L. (2008). An Object-Oriented Representation for Efficient Reinforcement Learning. In Proceedings of the 25th International Conference on Machine Learning, pages 240–247. ACM.