Learning Push Recovery Strategies for Bipedal Walking
Resumo
O artigo contribui com a implementação de um controlador Push Recovery que melhora o desempenho do motor de caminhada usado por um agente simulado humanóide do ambiente RoboCup Soccer 3D Simulation. A política de movimentos aprendida foi capaz de superar as baselines com significância estatística. Finalmente, propomos duas abordagens para remover vieses indesejados em nossas políticas finais.
Referências
Bain, M. and Sammut, C. (1995). A framework for behavioural cloning. In Machine Intelligence 15.
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., and Zhokhov, P. (2017). Openai baselines. https://github.com/openai/baselines.
Efron, B. and Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci., 1(1):54–75.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., and Bengio, Y. (2015). An empirical investigation of catastrophic forgetting in gradient-based neural networks.
Hofmann, A. (2006). Robust execution of bipedal walking tasks from biomechanical principles.
Horak, F., Henry, S., and Shumway-Cook, A. (1997). Postural perturbations: New insights for treatment of balance disorders. Physical therapy, 77:517–33.
Horak, F. and Macpherson, J. (1996). Postural orientation and equilibrium. in: Handbook of physiology. exercise: Regulation and integration of multiple systems. MD: Am Physiol Soc, pages 255–292.
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawai, E., and Matsubara, H. (1998). Robocup: A challenge problem for ai and robotics. In Kitano, H., editor, RoboCup-97: Robot Soccer World Cup I, pages 1–19, Berlin, Heidelberg. Springer Berlin Heidelberg.
Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Orseau, L., and Legg, S. (2017). Ai safety gridworlds.
Maximo, M. R., Colombini, E. L., and Ribeiro, C. H. (2017). Stable and fast model-free walk with arms movement for humanoid robots. International Journal of Advanced Robotic Systems, 14(3):1729881416675135.
Maximo, M. R. O. A. and Ribeiro, C. H. C. (2016). ZMP-Based Humanoid Walking Engine with Arms Movement and Stabilization. In Proceedings of the 2016 Congresso Brasileiro de Automática (CBA), Vitória, ES, Brazil. SBA.
Melo, D. C., Máximo, M. R. O. A., and da Cunha, A. M. (2020). Push recovery strategies through deep reinforcement learning. In 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pages 1–6.
Melo, L. C. and Maximo, M. R. O. A. (2019). Learning humanoid robot running skills through proximal policy optimization.
Nandi, G., Ijspeert, A., Chakraborty, P., and Nandi, A. (2009). Development of adaptive modular active leg (amal) using bipedal robotics technology. Robotics and Autonomous Systems, 57:603–616.
Nashner, L. (1981). Analysis of stance posture in humans.
Nashner, L. M. and McCollum, G. (1985). The organization of human postural movements: A formal basis and experimental synthesis. Behavioral and Brain Sciences, 8(1):135–150.
Oh, J., Singh, S. P., Lee, H., and Kohli, P. (2017). Zero-shot task generalization with multi-task deep reinforcement learning. CoRR, abs/1706.05064.
Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015). Deep face recognition.
Peng, X. B., Abbeel, P., Levine, S., and van de Panne, M. (2018). Deepmimic: Exampleguided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4).
Peng, X. B. and van de Panne, M. (2017). Learning locomotion skills using deeprl: Does the choice of action space matter? In Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, SCA ’17, pages 12:1–12:13, New York, NY, USA. ACM.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. CoRR, abs/1707.06347.
Singh, A., Jang, E., Irpan, A., Kappler, D., Dalal, M., Levine, S., Khansari, M., and Finn, C. (2020). Scalable multi-task imitation learning with autonomous improvement.
Stephens, B. (2007). Humanoid push recovery. In 2007 7th IEEE-RAS International Conference on Humanoid Robots, pages 589–595.
Tedrake, R. L. (2004). Applied Optimal Control for Dynamically Stable Legged Locomotion. PhD thesis, Massachusetts Institute of Technology.
Yang, C., Komura, T., and Li, Z. (2017). Emergence of human-comparable balancing behaviours by deep reinforcement learning. In 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pages 372–377.
Yang, C., Yuan, K., Merkt, W., Komura, T., Vijayakumar, S., and Li, Z. (2018). Learning whole-body motor skills for humanoids. In 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), pages 270–276.
Yi, S., Zhang, B., Hong, D., and Lee, D. D. (2013). Online learning of low dimensional strategies for high-level push recovery in bipedal humanoid robots. In 2013 IEEE International Conference on Robotics and Automation, pages 1649–1655.
Yi, S.-J., Zhang, B.-T., Hong, D., and Lee, D. (2011). Online learning of a full body push recovery controller for omnidirectional walking. pages 1–6.
Zhao, W., Queralta, J. P., and Westerlund, T. (2020). Sim-to-real transfer in deep reinforcement learning for robotics: a survey. CoRR, abs/2009.13303.