Reinforcement and Imitation Learning Applied to Autonomous Aerial Robot Control

Resumo


In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt, and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. Reinforcement Learning (RL) aims at addressing this problem by enabling a robot to learn behaviors through trial-and-error. With RL, a Neural Network can be trained as a function approximator to directly map states to actuator commands making any predefined control structure not-needed for training. However, the knowledge required to converge these methods is usually built from scratch. Learning may take a long time, not to mention that RL algorithms need a stated reward function. Sometimes, it is not trivial to define one. Often it is easier for a teacher, human or intelligent agent, do demonstrate the desired behavior or how to accomplish a given task. Humans and other animals have a natural ability to learn skills from observation, often from merely seeing these skills’ effects: without direct knowledge of the underlying actions. The same principle exists in Imitation Learning, a practical approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. In this scenario, this work’s primary objective is to design an agent that can successfully imitate a prior acquired control policy using Imitation Learning. The chosen algorithm is GAIL since we consider that it is the proper algorithm to tackle this problem by utilizing expert (state, action) trajectories. As reference expert trajectories, we implement state-of-the-art on and off-policy methods PPO and SAC. Results show that the learned policies for all three methods can solve the task of low-level control of a quadrotor and that all can account for generalization on the original tasks.

Palavras-chave: Reinforcement Learning, Imitation Learning, Aerial Robots

Referências

R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, MIT Press, Cambridge, MA, USA, 2018.

S. John, W. Filip, D. Prafulla, R. Alec, K. Oleg, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017).

T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, CoRR abs/1801.01290 (2018). arXiv:1801.01290.

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, S. Levine, Soft actor-critic algorithms and applications, CoRR abs/1812.05905 (2018). arXiv:1812.05905.

G. C. Lopes, Intelligent control of a quadrotor using reinforcement learning with proximal policy optimization, 2018.

G. Lopes, M. Ferreira, A. Simoes, E. Colombini, Intelligent control of a quadrotor with proximal policy optimization reinforcement learning, in: Latin American Robotic Symposium, 2018, pp. 503–508.

S. James, M. Freese, A. J. Davison, Pyrep: Bringing v-rep to deep robot learning, arXiv preprint arXiv:1906.11176 (2019).

E. Rohmer, S. P. N. Singh, M. Freese, Coppeliasim (formerly v-rep): a versatile and scalable robot simulation framework, in: IEEE IROS, 2013.

Y. Xu, Z. Liu, X. Wang, Monocular Vision based Autonomous Landing of Quadrotor through Deep Reinforcement Learning, in: 2018 37th Chinese Control Conference (CCC), 2018, pp. 10014–10019, iSSN: 1934-1768.

R. Polvara, M. Patacchiola, S. Sharma, J. Wan, A. Manning, R. Sutton, A. Cangelosi, Autonomous Quadrotor Landing using Deep Reinforcement Learning, arXiv:1709.03339 [cs]ArXiv: 1709.03339 (Feb. 2018).

C. Sampedro, A. Rodriguez-Ramos, I. Gil, L. Mejias, P. Campoy, Image-Based Visual Servoing Controller for Multirotor Aerial Robots Using Deep Reinforcement Learning, in: IEEE IROS, IEEE, Madrid, 2018, pp. 979–986.

C. Wang, J. Wang, Y. Shen, X. Zhang, Autonomous Navigation of UAVs in LargeScale Complex Environments: A Deep Reinforcement Learning Approach, IEEE Transactions on Vehicular Technology 68 (3) (2019) 2124–2136.

B. Zhou, W. Wang, Z. Liu, J. Wang, Vision-based Navigation of UAV with Continuous Action Space Using Deep Reinforcement Learning, in: 2019 Chinese Control And Decision Conference (CCDC), 2019, pp. 5030–5035, iSSN: 1948-9447.

S. Krishnan, B. Borojerdian, W. Fu, A. Faust, V. J. Reddi, Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots, arXiv:1906.00421 [cs]ArXiv: 1906.00421 (Jun. 2019).

S. Shah, D. Dey, C. Lovett, A. Kapoor, Airsim: High-fidelity visual and physical simulation for autonomous vehicles, in: Field and Service Robotics, 2017.

S. Li, T. Liu, C. Zhang, D.-Y. Yeung, S. Shen, Learning Unmanned Aerial Vehicle Control for Autonomous Target Following, arXiv:1709.08233 [cs] (2017).

T. Zhang, G. Kahn, S. Levine, P. Abbeel, Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search, arXiv:1509.06791 (2016).

W. Koch, R. Mancuso, R. West, A. Bestavros, Reinforcement Learning for UAV Attitude Control, ACM Transactions on Cyber-Physical Systems 3 (2) (2019) 1–21.

J. Xu, T. Du, M. Foshey, B. Li, B. Zhu, A. Schulz, W. Matusik, Learning to fly: computational controller design for hybrid UAVs with reinforcement learning, ACM Transactions on Graphics 38 (4) (2019) 1–12.

J. Hwangbo, I. Sa, R. Siegwart, M. Hutter, Control of a quadrotor with reinforcement learning, IEEE Robotics and Automation Letters 2 (4) (2017) 2096–2103.
Publicado
11/11/2020
BARROS, Gabriel Moraes; COLOMBINI, Esther L.. Reinforcement and Imitation Learning Applied to Autonomous Aerial Robot Control. In: CONCURSO DE TESES E DISSERTAÇÕES EM ROBÓTICA - CTDR (MESTRADO) - SIMPÓSIO BRASILEIRO DE ROBÓTICA E SIMPÓSIO LATINO-AMERICANO DE ROBÓTICA (SBR/LARS), 8. , 2020, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 73-84. DOI: https://doi.org/10.5753/wtdr_ctdr.2020.14956.