A 3D Q-Learning Algorithm for Offline UAV Path Planning with Priority Shifting Rewards

Kevin Braathen de  Carvalho; Hiago B. Batista; Iure L. de Oliveira; Alexandre S. Brandão

Kevin Braathen de Carvalho UFV
Hiago B. Batista UFV
Iure L. de Oliveira UFV
Alexandre S. Brandão UFV

Resumo

Autonomous navigational robotics is a field of great importance due to its vast array of applications such as exploration, transportation, industry or defense. When it comes to theses scenarios, Unmanned Aerial Vehicles (UAV), can enable different approaches that can increase the task's efficiency and/or flexibility. In this paper we propose an offline path planning for static 3D environments using Q-Learning. The reward shaping is done in such a fashion that is able to account for three different priorities, namely path length, energy consumption and safety, that can be tuned freely by the user to suit the desired application. The proposed algorithm is able to guide the agent towards the goal from anywhere in the map, which can be helpful in scenarios where internal or external instabilities that can lead the agent stray from its main path may be expected. Scalability tests where also done to benchmark the proposed method's performance for larger maps.

Palavras-chave: Three-dimensional displays, Q-learning, Service robots, Navigation, Transportation industry, Scalability, Autonomous aerial vehicles, Mobile Robotics, Reinforcement Learning, Path Planning, Unmanned Aerial Vehicles