Um Modelo para Otimização de Caminhada em Robôs Bípedes usando Pêndulo Invertido e Aprendizado por Reforço

Wesley S. Silva; Josemar Rodrigues de Souza; Ivanoé J. Rodowanski; Marco A. C. Simões

doi:10.5753/erbase.2024.4488

Wesley S. Silva UNEB
Josemar Rodrigues de Souza UNEB
Ivanoé J. Rodowanski UNEB
Marco A. C. Simões UNEB

DOI: https://doi.org/10.5753/erbase.2024.4488

Resumo

Este trabalho foca no desenvolvimento de um protótipo de Pêndulo Invertido (PI) com aprendizagem por reforço, acompanhado de um ambiente completo de treinamento utilizando a plataforma BahiaRT-GYM. Com o ambiente, utilizou-se a questão da inclinação do tronco do agente durante a caminhada, permitindo que seja realizado um treino com ajustes que proporcionem uma locomoção estável e fluida. A inclinação foi empregada como um caso prático para demonstrar a capacidade do ambiente de suportar e otimizar treinamentos eficazes. Os resultados mostram um aumento de desempenho de 26% com uma velocidade 27, 8% maior do modelo treinado por aprendizagem por reforço em relação ao PI. ambos superam o modelo original baseado no cart table.

Referências

Abreu, M., Reis, L. P., and Lau, N. (2023). Designing a Skilled Soccer Team for RoboCup: Exploring Skill-Set-Primitives through Reinforcement Learning. Codebase release at [link].

Joschka, B. and Asada, M. (2008). Simspark–concepts and application in the robocup 3d soccer simulation league. Autonomous Robots, 174:181.

JUSTO, D., SAUTER, E., AZEVEDO, F., GUIDI, L., and KONZEN, P. (2020). Cálculo Numérico: um livro colaborativo–versão Scilab. UFRGS.

Kasaei, M., Abreu, M., Lau, N., Pereira, A., and Reis, L. P. (2021). Robust biped locomotion using deep reinforcement learning on top of an analytical control approach. Robotics and Autonomous Systems, 146:103900.

Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawai, E., and Matsubara, H. (1998). Robocup: A challenge problem for ai and robotics. RoboCup-97: Robot soccer world cup I, H. Kitano, Org.

KOFINAS, N. (2012). Forward and inverse kinematics for the NAO humanoid robot. PhD thesis, Technical University of Crete.

Liu, C., Gao, J., Tian, D., Zhang, X., Liu, H., and Meng, L. (2021). A disturbance rejection control method based on deep reinforcement learning for a biped robot. Applied Sciences.

magmaOffenburg (2023). Magma challenge documentation.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms.

Shafii, N. (2015). Development of an optimized omnidirectional walk engine for humanoid robots. PhD thesis, Universidade do Porto (Portugal).

Simões, M. A., Mascarenhas, G., Fonseca, R., dos Santos, V. M., Mascarenhas, F., and Nogueira, T. (2022). Bahiart setplays collecting toolkit and bahiart gym. Software Impacts, 14:100401.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts, second edition edition.

Wang, M., Wang, R., Zhao, J., and Sun, P. (2018). An optimized algorithm based on energy efficiency for gait planning of humanoid robots. In IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society, pages 5612–5617. IEEE.

Yılmaz, S., Gokasan, M., and Bogosyan, S. (2020). Oscillation preventing closed-loop controllers via genetic algorithm for biped walking on flat and inclined surfaces. International Journal Of Advanced Computer Science And Applications, 11(5).