Learning Push Recovery Strategies for Bipedal Walking
Abstract
The present work provides an implementation of a Push Recovery controller that aids the walking engine used by a humanoid simulated robot. The simulation environment is the Robocup Soccer 3D Simulation League. The learned movement policies exceeded our original walking engine. In addition, we evaluated the policies and detected undesired biases. New methodologies were introduced in order to eliminate it.
Keywords:
Robot simulation and visualization tools, Robot planning, communication, adaptation and learning, Robot soccer
References
Abreu, M., Lau, N., Sousa, A., and Reis, L. P. (2019). Learning low level skills from scratch for humanoid robot soccer using deep reinforcement learning. In 2019 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pages 1–8.
Bain, M. and Sammut, C. (1995). A framework for behavioural cloning. In Machine Intelligence 15.
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., and Zhokhov, P. (2017). Openai baselines. https://github.com/openai/baselines.
Efron, B. and Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci., 1(1):54–75.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., and Bengio, Y. (2015). An empirical investigation of catastrophic forgetting in gradient-based neural networks.
Hofmann, A. (2006). Robust execution of bipedal walking tasks from biomechanical principles.
Horak, F., Henry, S., and Shumway-Cook, A. (1997). Postural perturbations: New insights for treatment of balance disorders. Physical therapy, 77:517–33.
Horak, F. and Macpherson, J. (1996). Postural orientation and equilibrium. in: Handbook of physiology. exercise: Regulation and integration of multiple systems. MD: Am Physiol Soc, pages 255–292.
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawai, E., and Matsubara, H. (1998). Robocup: A challenge problem for ai and robotics. In Kitano, H., editor, RoboCup-97: Robot Soccer World Cup I, pages 1–19, Berlin, Heidelberg. Springer Berlin Heidelberg.
Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Orseau, L., and Legg, S. (2017). Ai safety gridworlds.
Maximo, M. R., Colombini, E. L., and Ribeiro, C. H. (2017). Stable and fast model-free walk with arms movement for humanoid robots. International Journal of Advanced Robotic Systems, 14(3):1729881416675135.
Maximo, M. R. O. A. and Ribeiro, C. H. C. (2016). ZMP-Based Humanoid Walking Engine with Arms Movement and Stabilization. In Proceedings of the 2016 Congresso Brasileiro de Automática (CBA), Vitória, ES, Brazil. SBA.
Melo, D. C., Máximo, M. R. O. A., and da Cunha, A. M. (2020). Push recovery strategies through deep reinforcement learning. In 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pages 1–6.
Melo, L. C. and Maximo, M. R. O. A. (2019). Learning humanoid robot running skills through proximal policy optimization.
Nandi, G., Ijspeert, A., Chakraborty, P., and Nandi, A. (2009). Development of adaptive modular active leg (amal) using bipedal robotics technology. Robotics and Autonomous Systems, 57:603–616.
Nashner, L. (1981). Analysis of stance posture in humans.
Nashner, L. M. and McCollum, G. (1985). The organization of human postural movements: A formal basis and experimental synthesis. Behavioral and Brain Sciences, 8(1):135–150.
Oh, J., Singh, S. P., Lee, H., and Kohli, P. (2017). Zero-shot task generalization with multi-task deep reinforcement learning. CoRR, abs/1706.05064.
Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015). Deep face recognition.
Peng, X. B., Abbeel, P., Levine, S., and van de Panne, M. (2018). Deepmimic: Exampleguided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4).
Peng, X. B. and van de Panne, M. (2017). Learning locomotion skills using deeprl: Does the choice of action space matter? In Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, SCA ’17, pages 12:1–12:13, New York, NY, USA. ACM.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. CoRR, abs/1707.06347.
Singh, A., Jang, E., Irpan, A., Kappler, D., Dalal, M., Levine, S., Khansari, M., and Finn, C. (2020). Scalable multi-task imitation learning with autonomous improvement.
Stephens, B. (2007). Humanoid push recovery. In 2007 7th IEEE-RAS International Conference on Humanoid Robots, pages 589–595.
Tedrake, R. L. (2004). Applied Optimal Control for Dynamically Stable Legged Locomotion. PhD thesis, Massachusetts Institute of Technology.
Yang, C., Komura, T., and Li, Z. (2017). Emergence of human-comparable balancing behaviours by deep reinforcement learning. In 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pages 372–377.
Yang, C., Yuan, K., Merkt, W., Komura, T., Vijayakumar, S., and Li, Z. (2018). Learning whole-body motor skills for humanoids. In 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), pages 270–276.
Yi, S., Zhang, B., Hong, D., and Lee, D. D. (2013). Online learning of low dimensional strategies for high-level push recovery in bipedal humanoid robots. In 2013 IEEE International Conference on Robotics and Automation, pages 1649–1655.
Yi, S.-J., Zhang, B.-T., Hong, D., and Lee, D. (2011). Online learning of a full body push recovery controller for omnidirectional walking. pages 1–6.
Zhao, W., Queralta, J. P., and Westerlund, T. (2020). Sim-to-real transfer in deep reinforcement learning for robotics: a survey. CoRR, abs/2009.13303.
Bain, M. and Sammut, C. (1995). A framework for behavioural cloning. In Machine Intelligence 15.
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., and Zhokhov, P. (2017). Openai baselines. https://github.com/openai/baselines.
Efron, B. and Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci., 1(1):54–75.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., and Bengio, Y. (2015). An empirical investigation of catastrophic forgetting in gradient-based neural networks.
Hofmann, A. (2006). Robust execution of bipedal walking tasks from biomechanical principles.
Horak, F., Henry, S., and Shumway-Cook, A. (1997). Postural perturbations: New insights for treatment of balance disorders. Physical therapy, 77:517–33.
Horak, F. and Macpherson, J. (1996). Postural orientation and equilibrium. in: Handbook of physiology. exercise: Regulation and integration of multiple systems. MD: Am Physiol Soc, pages 255–292.
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawai, E., and Matsubara, H. (1998). Robocup: A challenge problem for ai and robotics. In Kitano, H., editor, RoboCup-97: Robot Soccer World Cup I, pages 1–19, Berlin, Heidelberg. Springer Berlin Heidelberg.
Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Orseau, L., and Legg, S. (2017). Ai safety gridworlds.
Maximo, M. R., Colombini, E. L., and Ribeiro, C. H. (2017). Stable and fast model-free walk with arms movement for humanoid robots. International Journal of Advanced Robotic Systems, 14(3):1729881416675135.
Maximo, M. R. O. A. and Ribeiro, C. H. C. (2016). ZMP-Based Humanoid Walking Engine with Arms Movement and Stabilization. In Proceedings of the 2016 Congresso Brasileiro de Automática (CBA), Vitória, ES, Brazil. SBA.
Melo, D. C., Máximo, M. R. O. A., and da Cunha, A. M. (2020). Push recovery strategies through deep reinforcement learning. In 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE), pages 1–6.
Melo, L. C. and Maximo, M. R. O. A. (2019). Learning humanoid robot running skills through proximal policy optimization.
Nandi, G., Ijspeert, A., Chakraborty, P., and Nandi, A. (2009). Development of adaptive modular active leg (amal) using bipedal robotics technology. Robotics and Autonomous Systems, 57:603–616.
Nashner, L. (1981). Analysis of stance posture in humans.
Nashner, L. M. and McCollum, G. (1985). The organization of human postural movements: A formal basis and experimental synthesis. Behavioral and Brain Sciences, 8(1):135–150.
Oh, J., Singh, S. P., Lee, H., and Kohli, P. (2017). Zero-shot task generalization with multi-task deep reinforcement learning. CoRR, abs/1706.05064.
Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015). Deep face recognition.
Peng, X. B., Abbeel, P., Levine, S., and van de Panne, M. (2018). Deepmimic: Exampleguided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4).
Peng, X. B. and van de Panne, M. (2017). Learning locomotion skills using deeprl: Does the choice of action space matter? In Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, SCA ’17, pages 12:1–12:13, New York, NY, USA. ACM.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. CoRR, abs/1707.06347.
Singh, A., Jang, E., Irpan, A., Kappler, D., Dalal, M., Levine, S., Khansari, M., and Finn, C. (2020). Scalable multi-task imitation learning with autonomous improvement.
Stephens, B. (2007). Humanoid push recovery. In 2007 7th IEEE-RAS International Conference on Humanoid Robots, pages 589–595.
Tedrake, R. L. (2004). Applied Optimal Control for Dynamically Stable Legged Locomotion. PhD thesis, Massachusetts Institute of Technology.
Yang, C., Komura, T., and Li, Z. (2017). Emergence of human-comparable balancing behaviours by deep reinforcement learning. In 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pages 372–377.
Yang, C., Yuan, K., Merkt, W., Komura, T., Vijayakumar, S., and Li, Z. (2018). Learning whole-body motor skills for humanoids. In 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), pages 270–276.
Yi, S., Zhang, B., Hong, D., and Lee, D. D. (2013). Online learning of low dimensional strategies for high-level push recovery in bipedal humanoid robots. In 2013 IEEE International Conference on Robotics and Automation, pages 1649–1655.
Yi, S.-J., Zhang, B.-T., Hong, D., and Lee, D. (2011). Online learning of a full body push recovery controller for omnidirectional walking. pages 1–6.
Zhao, W., Queralta, J. P., and Westerlund, T. (2020). Sim-to-real transfer in deep reinforcement learning for robotics: a survey. CoRR, abs/2009.13303.
Published
2021-10-14
How to Cite
MELO, Dicksiano C.; MAXIMO, Marcos R. O. A.; CUNHA, Adilson Marques da.
Learning Push Recovery Strategies for Bipedal Walking. In: GRADUATE WORKS CONTEST IN ROBOTICS - CTDR (MSC) - BRAZILIAN SYMPOSIUM OF ROBOTICS & LATIN AMERICAN ROBOTICS SYMPOSIUM (SBR/LARS), 9. , 2021, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 70-81.
DOI: https://doi.org/10.5753/wtdr_ctdr.2021.18686.
