Thrust Vectored Rocket Landing Integrated Guidance and Control with Proximal Policy Optimization
Resumo
This paper presents a 3 Degrees-of-Freedom (DoF) rocket landing model environment, controlled by an agent trained with the Proximal Policy Optimization (PPO) reinforcement learning algorithm. The objectives of this work are to model the dynamics of a rocket and its environment, convert into a simulated environment adequate to reinforcement learning, and evaluate PPO training results. This work contributes by implementing realistic models, and by contrasting basic implementations of PPO and another advanced reinforcement learning technique. The proposed model is a 3-DoF longitudinal rocket with mass-varying properties, landing gear, and stochastic wind disturbances. The environment is modeled with an observation space composed of kinematic and contact properties only, being a subset of all time-varying properties. The action space is composed of three elements: main thruster effort, nozzle angle, and side thruster effort. The reward computation is based on state, fuel consumption, action transitions, and termination status. Simple control techniques are generally not able to stabilize such complex systems. Reinforcement learning is chosen to tackle the complexity of the problem, and PPO for its theoretical training stability and continuous space treatment, in both observation and action space. Training and policy deployment assessments are presented to verify the algorithm efficacy and controllability of the proposed problem.
Palavras-chave:
Training, Rockets, Uncertainty, Heuristic algorithms, Reinforcement learning, Aerospace electronics, Aerodynamics
Publicado
18/10/2022
Como Citar
SOUZA, Gabriel De Almeida; SILVA, Octávio Mathias; MAXIMO, Marcos R. O. A..
Thrust Vectored Rocket Landing Integrated Guidance and Control with Proximal Policy Optimization. In: SIMPÓSIO BRASILEIRO DE ROBÓTICA E SIMPÓSIO LATINO AMERICANO DE ROBÓTICA (SBR/LARS), 19. , 2022, São Bernardo do Campo/SP.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 55-60.