Robot training in virtual environments using Reinforcement Learning techniques
Resumo
In this work, we propose a framework to train a robot in a virtual environment using Reinforcement Learning (RL) techniques and thus facilitating the use of this type of approach in robotics. With our integrated solution for virtual training, it is possible to programmatically change the environment parameters, making it easy to implement domain randomization techniques on-the-fly. We conducted experiments with a TurtleBot 2i in an indoor navigation task with static obstacle avoidance using an RL algorithm called Proximal Policy Optimization (PPO). Our results show that even though the training did not use any real data, the trained model was able to generalize to different virtual environments and real-world scenes.
Palavras-chave:
Reinforcement Learning, Robotics, Virtual Environments, Simulation
Referências
P. Kormushev, S. Calinon, and D. G. Caldwell, “Reinforcement learning in robotics: Applications and real-world challenges,” Robotics, vol. 2, no. 3, pp. 122–148, 2013.
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
K. Bousmalis and S. Levine, “Closing the simulation-to-reality gap for deep robotic learning,” Google Research Blog, 2017.
F. Sadeghi, A. Toshev, E. Jang, and S. Levine, “Sim2real view invariant visual servoing by recurrent control,” arXiv preprint arXiv:1712.07642, 2017.
F. Sadeghi and S. Levine, “Cad2rl: Real single-image flight without a single real image,” arXiv preprint arXiv:1611.04201, 2016.
J. Kua, N. Corso, and A. Zakhor, “Automatic loop closure detection using multiple cameras for 3d indoor localization,” in Computational Imaging X, vol. 8296, p. 82960V, International Society for Optics and Photonics, 2012.
A. Francis, A. Faust, H.-T. Chiang, J. Hsu, J. C. Kew, M. Fiser, and T.-W. E. Lee, “Long-range indoor navigation with prm-rl,” IEEE Transactions on Robotics, 2020.
H.-T. L. Chiang, A. Faust, M. Fiser, and A. Francis, “Learning navigation behaviors end-to-end with autorl,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019.
L. Xie, S. Wang, S. Rosa, A. Markham, and N. Trigoni, “Learning with training wheels: speeding up training with a simple controller for deep reinforcement learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6276–6283, IEEE, 2018.
O. Zhelo, J. Zhang, L. Tai, M. Liu, and W. Burgard, “Curiosity-driven exploration for mapless navigation with deep reinforcement learning,” arXiv preprint arXiv:1804.00456, 2018.
S. Li, Y. Wu, X. Cui, H. Dong, F. Fang, and S. Russell, “Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4213–4220, 2019.
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, pp. 1889–1897, 2015.
B. Bakker, “Reinforcement learning with long short-term memory,” in Advances in neural information processing systems, pp. 1475–1482, 2002.
J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, and I. Stoica, “Rllib: Abstractions for distributed reinforcement learning,” in International Conference on Machine Learning,pp. 3053–3062, 2018.
ROS.org, “The robot operating system (ros),” 2017.
P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” arXiv preprint arXiv:1709.06560, 2017.
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
K. Bousmalis and S. Levine, “Closing the simulation-to-reality gap for deep robotic learning,” Google Research Blog, 2017.
F. Sadeghi, A. Toshev, E. Jang, and S. Levine, “Sim2real view invariant visual servoing by recurrent control,” arXiv preprint arXiv:1712.07642, 2017.
F. Sadeghi and S. Levine, “Cad2rl: Real single-image flight without a single real image,” arXiv preprint arXiv:1611.04201, 2016.
J. Kua, N. Corso, and A. Zakhor, “Automatic loop closure detection using multiple cameras for 3d indoor localization,” in Computational Imaging X, vol. 8296, p. 82960V, International Society for Optics and Photonics, 2012.
A. Francis, A. Faust, H.-T. Chiang, J. Hsu, J. C. Kew, M. Fiser, and T.-W. E. Lee, “Long-range indoor navigation with prm-rl,” IEEE Transactions on Robotics, 2020.
H.-T. L. Chiang, A. Faust, M. Fiser, and A. Francis, “Learning navigation behaviors end-to-end with autorl,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019.
L. Xie, S. Wang, S. Rosa, A. Markham, and N. Trigoni, “Learning with training wheels: speeding up training with a simple controller for deep reinforcement learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6276–6283, IEEE, 2018.
O. Zhelo, J. Zhang, L. Tai, M. Liu, and W. Burgard, “Curiosity-driven exploration for mapless navigation with deep reinforcement learning,” arXiv preprint arXiv:1804.00456, 2018.
S. Li, Y. Wu, X. Cui, H. Dong, F. Fang, and S. Russell, “Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4213–4220, 2019.
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, pp. 1889–1897, 2015.
B. Bakker, “Reinforcement learning with long short-term memory,” in Advances in neural information processing systems, pp. 1475–1482, 2002.
J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, and I. Stoica, “Rllib: Abstractions for distributed reinforcement learning,” in International Conference on Machine Learning,pp. 3053–3062, 2018.
ROS.org, “The robot operating system (ros),” 2017.
P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” arXiv preprint arXiv:1709.06560, 2017.
Publicado
07/11/2020
Como Citar
SOARES, Natália Souza; TEIXEIRA, João Marcelo Xavier Natário; TEICHRIEB, Veronica.
Robot training in virtual environments using Reinforcement Learning techniques. In: WORKSHOP DE INICIAÇÃO CIENTÍFICA - SIMPÓSIO DE REALIDADE VIRTUAL E AUMENTADA (SVR), 22. , 2020, Evento Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 25-29.
DOI: https://doi.org/10.5753/svr_estendido.2020.12950.