Learning by Demonstration of Coordinated Plans in Multiagent Systems

Marco A. C. Simões; Tatiane Nogueira

doi:10.5753/wtdr_ctdr.2022.227372

Marco A. C. Simões UFBA / UNEB
Tatiane Nogueira UFBA

DOI: https://doi.org/10.5753/wtdr_ctdr.2022.227372

Resumo

One of the significant challenges in Multiagent Systems (MAS) is the creation of cooperative plans to deal with the different scenarios that present themselves in a dynamic, real-time environment composed of teams of mobile robots. This work involves capturing human knowledge to demonstrate how robot teams can better cooperate in solving the problem they must solve. The research used the environment RoboCup 3D Soccer Simulation (3DSSIM) and the collection of human demonstrations were carried out through a set of tools developed from adapting existing solutions in the RoboCup community using a crowdsourcing strategy. In addition, fuzzy clustering was used to gather human demonstrations (setplays) with the same semantic meaning, even with minor differences. With the data organized, this work used a reinforcement learning mechanism to learn a classification policy that allows agents to decide which group of setplays is best suited to each situation that presents itself in the environment. The results show the ability of the robot team to evolve, from learning the suggested setplays and their use in an appropriate way to the skills of each robot.

Referências

Abreu, M., Lau, N., Sousa, A., and Reis, L. P. (2019). Learning low level skills from scratch for humanoid robot soccer using deep reinforcement learning. In 2019 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pages 1-8.

Abreu, M., Silva, T., Teixeira, H., Reis, L. P., and Lau, N. (2021). 6D Localization and Kicking for Humanoid Robotic Soccer. Journal of Intelligent & Robotic Systems, 102(2):30.

Bianchi, R. A., Santos, P. E., da Silva, I. J., Celiberto, L. A., and de Mantaras, R. L. (2018). Heuristically accelerated reinforcement learning by means of case-based reasoning and transfer learning. Journal of Intelligent & Robotic Systems, 91(2):301-312.

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv.org. eprint: arXiv:1606.01540.

Cravo, J., Almeida, F., Abreu, P. H., Reis, L. P., Lau, N., and Mota, L. (2014). Strategy planner: Graphical definition of soccer set-plays. Data & Knowledge Engineering, 94:110-131.

Eustáquio, F., Camargo, H., Rezende, S., and Nogueira, T. (2018). On Fuzzy Cluster Validity Indexes for High Dimensional Feature Space. In Kacprzyk, J., Szmidt, E., Zadrozny, S., Atanassov, K. T., and Krawczak, M., editors, Advances in Fuzzy Logic and Technology 2017, Advances in Intelligent Systems and Computing, pages 12-23. Springer International Publishing.

Fabro, J. A., Reis, L. P., and Lau, N. (2014). Using Reinforcement Learning Techniques to Select the Best Action in Setplays with Multiple Possibilities in Robocup Soccer Simulation Teams. In 2014 Joint Conference on Robotics: SBR-LARS Robotics Symposium and Robocontrol, pages 85-90, Sao Carlos, Sao Paulo, Brazil. IEEE.

Freelan, D., Wicke, D., Sullivan, K., and Luke, S. (2014). Towards Rapid Multi-robot Learning from Demonstration at the RoboCup Competition. In RoboCup 2014: Robot World Cup XVIII, Lecture Notes in Computer Science, pages 369-382. Springer, Cham.

Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawai, E., and Matsubara, H. (1998). RoboCup: A challenge problem for AI and robotics. In Kitano, H., editor, RoboCup-97: Robot soccer world cup I, pages 1-19, Berlin, Heidelberg. Springer Berlin Heidelberg.

Melo, L. C., Melo, D. C., and Maximo, M. R. O. A. (2021). Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization. Journal of Intelligent & Robotic Systems, 102(3):54.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.

Moradi, M., Ardestani, M. A., and Moradi, M. (2016). Learning decision making for Soccer Robots: A crowdsourcing-based approach. In 2016 Artificial Intelligence and Robotics (IRANOPEN), pages 25-29.

Mota, L., Lau, N., and Reis, L. P. (2010). Co-ordination in RoboCup's 2D simulation league: Setplays as flexible, multi-robot plans. In 2010 IEEE Conference on Robotics, Automation and Mechatronics, pages 362-367.

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research. Publisher: MIT Press.

Reis, L. P., Lopes, R., Mota, L., and Lau, N. (2010). Playmaker: Graphical definition of formations and setplays. In 5th Iberian Conference on Information Systems and Technologies, pages 1-6.

Russell, S. J. and Norvig, P. (2021). Artificial intelligence: a modern approach. Pearson Series in Artificial Intelligence. Pearson, Hoboken, NJ, fourth edition edition.

Shi, H., Lin, Z., Hwang, K., Yang, S., and Chen, J. (2018). An Adaptive Strategy Selection Method With Reinforcement Learning for Robotic Soccer Games. IEEE Access, 6:8376-8386.

Simões, M. A. C., Nobre, J., Sousa, G., Souza, C., Silva, R. M., Campos, J., Souza, J. R., and Nogueira, T. (2020). Strategy Planner: Enhancements to support better defense and pass strategies within an LfD approach. In 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pages 46-52, Ponta Delgada, Portugal. IEEE. tex.ids= Simoes_2020.

Simões, M. A. C. (2022). Aprendizagem por Demonstração de Planos Coordenados em Sistemas Multiagentes. Ph.D., Universidade Federal da Bahia.

Simões, M. A. C., da Silva, R. M., and Nogueira, T. (2020). A Dataset Schema for Cooperative Learning from Demonstration in Multi-robot Systems. Journal of Intelligent & Robotic Systems, 99(3-4):589-608. tex.ids= Simoes_2019 publisher: Springer Science and Business Media LLC.

Simões, M. A. C., Nobre, J., Sousa, G., Souza, C., Silva, R. M., Campos, J., Souza, J. R., and Nogueira, T. (2021). Generating a dataset for learning setplays from demonstration. SN Applied Sciences, 3(6):608. tex.ids= Simoes_2021.

Simões, M. A. C. and Nogueira, T. (2018). Towards setplays learning in a multiagent robotic soccer team. In 2018 latin american robotic symposium, 2018 brazilian symposium on robotics (SBR) and 2018 workshop on robotics in education (WRE), pages 277-282. tex.ids= Simoes_2018 tex.copyright: All rights reserved.

Simões, M. A., Mascarenhas, G., Fonseca, R., dos Santos, V. M., Mascarenhas, F., and Nogueira, T. (2022). BahiaRT Setplays Collecting Toolkit and BahiaRT Gym. Software Impacts, 14:100401.

Spitznagel, M., Weiler, D., and Dorer, K. (2021). Deep Reinforcement Multi-Directional Kick-Learning of a Simulated Robot with Toes. In 2021 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pages 104-110.

Teixeira, H., Silva, T., Abreu, M., and Reis, L. P. (2020). Humanoid Robot Kick in Motion Ability for Playing Robotic Soccer. In 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pages 34-39, Ponta Delgada, Portugal. IEEE.

Wangenheim, C. G. v. and Wangenheim, A. v. (2003). Raciocínio baseado em casos. Manole, Barueri. OCLC: 69935690.

Wooldridge, M. (2009). An Introduction to Multiagent Systems. Wiley, Chichester, UK, 2 edition.