Thrust Vectored Rocket Landing Integrated Guidance and Control with Proximal Policy Optimization

Gabriel De Almeida Souza; Octávio Mathias Silva; Marcos R. O. A. Maximo

Gabriel De Almeida Souza ITA
Octávio Mathias Silva ITA / Bizu Space
Marcos R. O. A. Maximo Bizu Space

Resumo

This paper presents a 3 Degrees-of-Freedom (DoF) rocket landing model environment, controlled by an agent trained with the Proximal Policy Optimization (PPO) reinforcement learning algorithm. The objectives of this work are to model the dynamics of a rocket and its environment, convert into a simulated environment adequate to reinforcement learning, and evaluate PPO training results. This work contributes by implementing realistic models, and by contrasting basic implementations of PPO and another advanced reinforcement learning technique. The proposed model is a 3-DoF longitudinal rocket with mass-varying properties, landing gear, and stochastic wind disturbances. The environment is modeled with an observation space composed of kinematic and contact properties only, being a subset of all time-varying properties. The action space is composed of three elements: main thruster effort, nozzle angle, and side thruster effort. The reward computation is based on state, fuel consumption, action transitions, and termination status. Simple control techniques are generally not able to stabilize such complex systems. Reinforcement learning is chosen to tackle the complexity of the problem, and PPO for its theoretical training stability and continuous space treatment, in both observation and action space. Training and policy deployment assessments are presented to verify the algorithm efficacy and controllability of the proposed problem.

Palavras-chave: Training, Rockets, Uncertainty, Heuristic algorithms, Reinforcement learning, Aerospace electronics, Aerodynamics