Aprendizado por Reforço Profundo para Navegação Sem Mapa de um Veículo Híbrido Aéreo-Aquático usando Imagens
Abstract
Reinforcement Learning (RL) has shown impressive performance in video games and continuous control tasks. However, RL has poor performance with high-dimensional observations such as raw pixel images. It is generally accepted that RL policies based on physical state, such as laser sensor measurements, provide more efficient sampling results than pixel-based learning. This work presents a new approach that extracts information from a depth map estimate and raw pixel images to train an RL agent to perform map-less navigation of a Hybrid Unmanned Aerial-Underwater Vehicle (HUAUV). The proposed approach is called Unsupervised Prioritized Contrastive Representations of Depth and Pixel Images in Reinforcement Learning (CUPRL and Depth-CUPRL) which estimates depth from images and uses raw pixel images with a prioritized replay memory. A combination of RL and Contrastive Learning is used to address the problem of RL based on image observations. Contrastive Learning allows to create a latent space that is capable of mapping pixel and depth images in a way that, even using only pixel images, it is possible to create efficient representations to solve navigation problems in complex environments. From the results obtained with the HUAUV, it can be concluded that the proposed CUPRL and Depth-CUPRL approach is effective for decision making and outperforms state-of-the-art pixel-based approaches in map-less navigation.
References
de Jesus, J. C., Kich, V. A., Kolling, A. H., Grando, R. B., Guerra, R. S., and Drews, P. L. (2022). Depth-cuprl: Depth-imaged contrastive unsupervised prioritized representations in reinforcement learning for mapless navigation of unmanned aerial vehicles. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10579–10586. IEEE.
Grando, R. B., de Jesus, J. C., and Drews-Jr, P. L. (2020). Deep reinforcement learning for mapless navigation of unmanned aerial vehicles. In 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pages 1–6. IEEE.
Grando, R. B., de Jesus, J. C., Kich, V. A., Kolling, A. H., Bortoluzzi, N. P., Pinheiro, P. M., Alves Neto, A., and Drews-Jr, P. L. J. (2021a). Deep reinforcement learning for mapless navigation of a hybrid aerial underwater vehicle with medium transition. In IEEE ICRA, pages 1088–1094.
Grando, R. B., de Jesus, J. C., Kich, V. A., Kolling, A. H., Bortoluzzi, N. P., Pinheiro, P. M., Neto, A. A., and Drews, P. L. J. (2021b). Deep reinforcement learning for mapless navigation of a hybrid aerial underwater vehicle with medium transition. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1088–1094.
Grando, R. B., de Jesus, J. C., Kich, V. A., Kolling, A. H., and Drews-Jr, P. L. J. (2022). Double critic deep reinforcement learning for mapless 3d navigation of unmanned aerial vehicles. Journal of Intelligent & Robotic Systems, 104(2):1–14.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018a). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290.
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., and Levine, S. (2018b). Soft actor-critic algorithms and applications. CoRR, abs/1812.05905.
He, L., Aouf, N., Whidborne, J. F., and Song, B. (2020). Integrated moment-based LGMD and deep reinforcement learning for UAV obstacle avoidance. In IEEE ICRA, pages 7491–7497.
Jesus, J. C., Bottega, J. A., Cuadros, M. A., and Gamarra, D. F. (2019). Deep deterministic policy gradient for navigation of mobile robots in simulated environments. In 2019 19th International Conference on Advanced Robotics (ICAR), pages 362–367. IEEE.
Jesus, J. C. d., Kich, V. A., Kolling, A. H., Grando, R. B., Cuadros, M. A. d. S. L., and Gamarra, D. F. T. (2021). Soft actor-critic for navigation of mobile robots. Journal of Intelligent & Robotic Systems, 102(2):1–11.
Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Campbell, R. H., Czechowski, K., Erhan, D., Finn, C., Kozakowski, P., Levine, S., et al. (2019). Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374.
Laskin, M., Srinivas, A., and Abbeel, P. (2020). Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR.
Li, B., Gan, Z., Chen, D., and Sergey Aleksandrovich, D. (2020). Uav maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning. Remote Sensing, 12(22):3789.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
Rodriguez-Ramos, A., Sampedro, C., Bavle, H., Moreno, I. G., and Campoy, P. (2018). A deep reinforcement learning technique for vision-based autonomous multirotor landing on a moving platform. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1010–1017. IEEE.
Sampedro, C., Rodriguez-Ramos, A., Bavle, H., Carrio, A., de la Puente, P., and Campoy, P. (2019). A fully-autonomous aerial robot for search and rescue applications in indoor environments using learning-based techniques. Journal of Intelligent & Robotic Systems, 95(2):601–627.
Tai, L. and Liu, M. (2016). Towards cognitive exploration through deep reinforcement learning for mobile robots. arXiv preprint arXiv:1610.01733.
Thomas, D.-G., Olshanskyi, D., Krueger, K., Wongpiromsarn, T., and Jannesari, A. (2021). Interpretable uav collision avoidance using deep reinforcement learning. arXiv preprint arXiv:2105.12254.
