Aprendizado por Reforço Profundo para Navegação Sem Mapa de um Veículo Híbrido Aéreo-Aquático usando Imagens

Junior D. Jesus; Paulo L. J. Drews-Jr; Rodrigo S. Guerra

doi:10.5753/sbrlars_estendido.2023.234838

Junior D. Jesus FURG
Paulo L. J. Drews-Jr FURG
Rodrigo S. Guerra FURG

DOI: https://doi.org/10.5753/sbrlars_estendido.2023.234838

Resumo

O Aprendizado por Reforço (RL) tem se mostrado altamente eficaz em jogos eletrônicos e tarefas de controle contínuo. Entretanto, RL apresenta dificuldades em lidar com observações de alta dimensionalidade, como imagens brutas de pixels. É amplamente aceito que políticas de RL baseadas em estado físico, como medições de sensores a laser, geralmente produzem amostragens mais eficientes do que o aprendizado com base em pixels. Neste trabalho, é proposta uma nova abordagem que combina informações de uma estimativa de mapa de profundidade e imagens brutas de pixels para ensinar um agente de RL a realizar a navegação sem mapa em um Veículo Híbrido Aéreo-Aquático (HUAUV). Esta abordagem, denominada Representações Priorizadas Contrastivas Não Supervisionadas de Imagens de Profundidade e Pixel em Aprendizado por Reforço (CUPRL e Depth-CUPRL), estima a profundidade de imagens e utiliza imagens brutas de pixels com uma memória de repetição priorizada. É utilizada uma combinação de RL e Aprendizagem Contrastiva para lidar com o desafio de aprender com base em observações de imagens. A Aprendizagem Contrastiva permite criar um espaço latente que é capaz de mapear imagens de pixel e profundidade de tal forma que, mesmo ao utilizar apenas imagens de pixels, é possível criar representações eficientes para solucionar problemas de navegação em ambientes complexos. Os resultados obtidos com o HUAUV indicam que a abordagem proposta é eficaz na tomada de decisão e supera as abordagens baseadas em pixels existentes na capacidade de navegação sem mapa.

Referências

Cho, J., Lim, G., Biobaku, T., Kim, S., and Parsaei, H. (2015). Safety and security management with unmanned aerial vehicle (uav) in oil and gas industry. Procedia manufacturing, 3:1343–1349.

de Jesus, J. C., Kich, V. A., Kolling, A. H., Grando, R. B., Guerra, R. S., and Drews, P. L. (2022). Depth-cuprl: Depth-imaged contrastive unsupervised prioritized representations in reinforcement learning for mapless navigation of unmanned aerial vehicles. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10579–10586. IEEE.

Grando, R. B., de Jesus, J. C., and Drews-Jr, P. L. (2020). Deep reinforcement learning for mapless navigation of unmanned aerial vehicles. In 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pages 1–6. IEEE.

Grando, R. B., de Jesus, J. C., Kich, V. A., Kolling, A. H., Bortoluzzi, N. P., Pinheiro, P. M., Alves Neto, A., and Drews-Jr, P. L. J. (2021a). Deep reinforcement learning for mapless navigation of a hybrid aerial underwater vehicle with medium transition. In IEEE ICRA, pages 1088–1094.

Grando, R. B., de Jesus, J. C., Kich, V. A., Kolling, A. H., Bortoluzzi, N. P., Pinheiro, P. M., Neto, A. A., and Drews, P. L. J. (2021b). Deep reinforcement learning for mapless navigation of a hybrid aerial underwater vehicle with medium transition. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1088–1094.

Grando, R. B., de Jesus, J. C., Kich, V. A., Kolling, A. H., and Drews-Jr, P. L. J. (2022). Double critic deep reinforcement learning for mapless 3d navigation of unmanned aerial vehicles. Journal of Intelligent & Robotic Systems, 104(2):1–14.

Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018a). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290.

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., and Levine, S. (2018b). Soft actor-critic algorithms and applications. CoRR, abs/1812.05905.

He, L., Aouf, N., Whidborne, J. F., and Song, B. (2020). Integrated moment-based LGMD and deep reinforcement learning for UAV obstacle avoidance. In IEEE ICRA, pages 7491–7497.

Jesus, J. C., Bottega, J. A., Cuadros, M. A., and Gamarra, D. F. (2019). Deep deterministic policy gradient for navigation of mobile robots in simulated environments. In 2019 19th International Conference on Advanced Robotics (ICAR), pages 362–367. IEEE.

Jesus, J. C. d., Kich, V. A., Kolling, A. H., Grando, R. B., Cuadros, M. A. d. S. L., and Gamarra, D. F. T. (2021). Soft actor-critic for navigation of mobile robots. Journal of Intelligent & Robotic Systems, 102(2):1–11.

Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Campbell, R. H., Czechowski, K., Erhan, D., Finn, C., Kozakowski, P., Levine, S., et al. (2019). Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374.

Laskin, M., Srinivas, A., and Abbeel, P. (2020). Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR.

Li, B., Gan, Z., Chen, D., and Sergey Aleksandrovich, D. (2020). Uav maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning. Remote Sensing, 12(22):3789.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

Rodriguez-Ramos, A., Sampedro, C., Bavle, H., Moreno, I. G., and Campoy, P. (2018). A deep reinforcement learning technique for vision-based autonomous multirotor landing on a moving platform. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1010–1017. IEEE.

Sampedro, C., Rodriguez-Ramos, A., Bavle, H., Carrio, A., de la Puente, P., and Campoy, P. (2019). A fully-autonomous aerial robot for search and rescue applications in indoor environments using learning-based techniques. Journal of Intelligent & Robotic Systems, 95(2):601–627.

Tai, L. and Liu, M. (2016). Towards cognitive exploration through deep reinforcement learning for mobile robots. arXiv preprint arXiv:1610.01733.

Thomas, D.-G., Olshanskyi, D., Krueger, K., Wongpiromsarn, T., and Jannesari, A. (2021). Interpretable uav collision avoidance using deep reinforcement learning. arXiv preprint arXiv:2105.12254.