Reinforcement Learning Applied to Train Autonomous Maritime Search and Rescue Drones

Jorás C. C. de Oliveira; Pedro H. B. A. Andrade; Renato L. Falcão; Ricardo R. Rodrigues; José Fernando Brancalion; Fabrício J. Barth

doi:10.5753/eniac.2024.245030

Jorás C. C. de Oliveira Insper
Pedro H. B. A. Andrade Insper
Renato L. Falcão Insper
Ricardo R. Rodrigues Insper
José Fernando Brancalion Embraer
Fabrício J. Barth Insper

DOI: https://doi.org/10.5753/eniac.2024.245030

Resumo

This paper presents a Search and Rescue (SAR) environment tailored for locating shipwrecked individuals and evaluation of Reinforcement Learning (RL) algorithms under different scenarios, considering a variety of different hypotheses, and an extensive number of experiments. Our findings indicate that RL techniques, particularly Proximal Policy Optimization (PPO), significantly outperform traditional greedy algorithms regarding success rates. Centralized network architectures demonstrate superior convergence compared to decentralized ones. Historical search data does not notably enhance algorithm performance, suggesting that real-time observations are sufficient. Agents are able to naturally parallelize the search efforts within a given probability zone while prioritizing higher probability areas first. Finally, while managing multiple persons-in-water (PIWs) increases complexity, agents show effective coordination and improvement over time, underscoring the potential of RL in complex SAR missions. This study highlights the promising role of RL in optimizing SAR operations.

Palavras-chave: maritime search and rescue, deep reinforcement learning

Referências

Abreu, L. D. M., Carrete, L. F. S., Castanares, M., Damiani, E. F., Brancalion, J. F., and Barth, F. J. (2023). Exploration and rescue of shipwreck survivors using reinforcement learning-empowered drone swarms. In XXV Simpósio de Aplicações Operacionais em Áreas de Defesa, pages 64–69.

Ai, B., Jia, M., Xu, H., Xu, J., Wen, Z., Li, B., and Zhang, D. (2021). Coverage path planning for maritime search and rescue using reinforcement learning. Ocean Engineering, 241:110098.

Albrecht, S. V., Christianos, F., and Schäfer, L. (2024). Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press.

Alotaibi, E. T., Alqefari, S. S., and Koubaa, A. (2019). Lsar: Multi-uav collaboration for search and rescue missions. IEEE Access, 7:55817–55832.

Falcão, R. L., de Oliveira, J. C. C., Andrade, P. H. B. A., Rodrigues, R. R., Barth, F. J., and Brancalion, J. F. B. (2024). DSSE: An environment for simulation of reinforcement learning-empowered drone swarm maritime search and rescue missions. Journal of Open Source Software, 9(99):6746.

Labanca, A. M. (2024). Deep reinforcement learning for kamikaze drone decision-making. Master’s thesis, Instituto Tecnológico de Aeronáutica.

Li, X., Li, L., Gao, J., He, X., Chen, J., Deng, L., and He, J. (2015). Recurrent reinforcement learning: A hybrid approach. arXiv preprint arXiv:1509.03044.

Liang, E., Liaw, R., Moritz, P., Nishihara, R., Fox, R., Goldberg, K., Gonzalez, J. E., Jordan, M. I., and Stoica, I. (2018). Rllib: Abstractions for distributed reinforcement learning. International Conference on Machine Learning.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518:529–533.

Schuldt, D. W. and Kurucar, J. A. (2016). Maritime search and rescue via multiple coordinated uas. Technical report, Defense Technical Information Center.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, abs/1707.06347.

Silver, D., Singh, S., Precup, D., and Sutton, R. S. (2021). Reward is enough. Artificial Intelligence, 299:103535.

Wu, J., Cheng, L., Chu, S., and Song, Y. (2024). An autonomous coverage path planning algorithm for maritime search and rescue of persons-in-water based on deep reinforcement learning. Ocean Engineering, 291:116403.