Robot training in virtual environments using Reinforcement Learning techniques

Natália Souza Soares; João Marcelo Xavier Natário Teixeira; Veronica Teichrieb

doi:10.5753/svr_estendido.2020.12950

Natália Souza Soares UFPE
João Marcelo Xavier Natário Teixeira UFPE
Veronica Teichrieb UFPE

DOI: https://doi.org/10.5753/svr_estendido.2020.12950

Resumo

In this work, we propose a framework to train a robot in a virtual environment using Reinforcement Learning (RL) techniques and thus facilitating the use of this type of approach in robotics. With our integrated solution for virtual training, it is possible to programmatically change the environment parameters, making it easy to implement domain randomization techniques on-the-fly. We conducted experiments with a TurtleBot 2i in an indoor navigation task with static obstacle avoidance using an RL algorithm called Proximal Policy Optimization (PPO). Our results show that even though the training did not use any real data, the trained model was able to generalize to different virtual environments and real-world scenes.

Palavras-chave: Reinforcement Learning, Robotics, Virtual Environments, Simulation

Referências

P. Kormushev, S. Calinon, and D. G. Caldwell, “Reinforcement learning in robotics: Applications and real-world challenges,” Robotics, vol. 2, no. 3, pp. 122–148, 2013.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.

K. Bousmalis and S. Levine, “Closing the simulation-to-reality gap for deep robotic learning,” Google Research Blog, 2017.

F. Sadeghi, A. Toshev, E. Jang, and S. Levine, “Sim2real view invariant visual servoing by recurrent control,” arXiv preprint arXiv:1712.07642, 2017.

F. Sadeghi and S. Levine, “Cad2rl: Real single-image flight without a single real image,” arXiv preprint arXiv:1611.04201, 2016.

J. Kua, N. Corso, and A. Zakhor, “Automatic loop closure detection using multiple cameras for 3d indoor localization,” in Computational Imaging X, vol. 8296, p. 82960V, International Society for Optics and Photonics, 2012.

A. Francis, A. Faust, H.-T. Chiang, J. Hsu, J. C. Kew, M. Fiser, and T.-W. E. Lee, “Long-range indoor navigation with prm-rl,” IEEE Transactions on Robotics, 2020.

H.-T. L. Chiang, A. Faust, M. Fiser, and A. Francis, “Learning navigation behaviors end-to-end with autorl,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019.

L. Xie, S. Wang, S. Rosa, A. Markham, and N. Trigoni, “Learning with training wheels: speeding up training with a simple controller for deep reinforcement learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6276–6283, IEEE, 2018.

O. Zhelo, J. Zhang, L. Tai, M. Liu, and W. Burgard, “Curiosity-driven exploration for mapless navigation with deep reinforcement learning,” arXiv preprint arXiv:1804.00456, 2018.

S. Li, Y. Wu, X. Cui, H. Dong, F. Fang, and S. Russell, “Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4213–4220, 2019.

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, pp. 1889–1897, 2015.

B. Bakker, “Reinforcement learning with long short-term memory,” in Advances in neural information processing systems, pp. 1475–1482, 2002.

J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.

E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, and I. Stoica, “Rllib: Abstractions for distributed reinforcement learning,” in International Conference on Machine Learning,pp. 3053–3062, 2018.

ROS.org, “The robot operating system (ros),” 2017.

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” arXiv preprint arXiv:1709.06560, 2017.