Inducing selfish agents towards social efficient solutions

João Schapke; Ana Bazzan

doi:10.5753/kdmile.2020.11953

João Schapke UFRGS
Ana Bazzan UFRGS

DOI: https://doi.org/10.5753/kdmile.2020.11953

Resumo

Many multi-agent reinforcement learning (MARL) scenarios lead towards Nash equilibria, which is known to not always be socially efficient. In this study we aim to align the social optimization objective of the system with the individual objectives of the agents by adopting a central controller which can interact with the agents. In details, our approach establishes a communication channel between reinforcement learning agents, and a controller implemented with metaheuristics. The interaction benefit the convergence of both algorithms. Further, we evaluate our method in repeated games with high price of anarchy and show that our approach is able to overcome much of the issues caused by the non-cooperative behaviour of the agents and the non-stationary effects they cause.

Palavras-chave: genetic algorithm, metaheuristics, Q-learning, reinforcement learning

Referências

Bazzan, A. L. C. Aligning individual and collective welfare in complex socio-technical systems by combining meta-heuristics and reinforcement learning. Eng. Appl. of AI vol. 79, pp. 23–33, 2019.

Claus, C. and Boutilier, C. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence. AAAI ’98/IAAI ’98. American Association for Artificial Intelligence, Menlo Park, CA, USA, pp. 746–752, 1998.

Fehr, E. and Schmidt, K. M. A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics 114 (3): 817–868, 1999.

Foerster, J., Assael, I. A., de Freitas, N., and Whiteson, S. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., pp. 2137–2145, 2016.

Goldberg, D. E. Genetic algorithms. Pearson Education India, 2006.

Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez Guzman, E., García Castañeda, A., Dunning, I., Zhu, T., McKee, K., Koster, R., Roff, H., and Graepel, T. Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., pp. 3326–3336, 2018.

Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., and Graepel, T. Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’17. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp. 464–473, 2017.

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., and Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. CoRR vol. abs/1602.01783, 2016.

Pérolat, J., Leibo, J. Z., Zambaldi, V., Beattie, C., Tuyls, K., and Graepel, T. A multi-agent reinforcement learning model of common-pool resource appropriation. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., pp. 3643–3652, 2017.

Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning (ICML 1993). Morgan Kaufmann, Amherst, MA, USA, pp. 330–337, 1993.

Verbeeck, K., Nowé, A., Parent, J., and Tuyls, K. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems 14 (3): 239–269, Apr, 2007.

Watkins, C. J. C. H. and Dayan, P. Q-learning. Machine Learning 8 (3): 279–292, 1992.