Short-term Inbetweening of 3D Human Motions

  • Fabio Neves Rocha USP
  • Valdinei Freire USP
  • Karina Valdivia Delgado USP


Creating computer generated human animations without the use of motion capture technology is a tedious and time consuming activity. Although there are several publications regarding animation synthesis using data driven methods, not many are dedicated towards the task of inbetweening, which consists of generating transition movements between frames. A modified version of LSTM, called Recurrent Transition Network (RTN), solves the inbetweening task for walking motion based on ten initial frames and two final frames. In this work, we are interested on the short-term inbetweening task, where we need to use the least amount of frames to generate the missing frames for short-term transitions. We are also interested on different kinds of movements, such as martial arts and Indian dance. Thus, we adapt the Recurrent Transition Network (RTN) to require only the two firts frames and the last one, called ARTN, and propose a simple post processing method combining ARTN with linear interpolation, called ARTN+. The results show that the average error of ARTN+ is less than the average error of each method (RTN and interpolation) separately in the martial arts and Indian dance dataset.


Bellman, R. (1966). Dynamic programming. Science, 153(3731):34–37.

Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N. (2015). Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems, pages 1171–1179.

Betrancourt, M. (2005). The Animation and Interactivity Principles in Multimedia Learning, page 287–296. Cambridge Handbooks in Psychology. Cambridge University Press.

Butepage, J., Black, M. J., Kragic, D., and Kjellstrom, H. (2017). Deep representation learning for human motion prediction and classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Chai, J. and Hodgins, J. K. (2007). Constraint-based motion optimization using a statistical dynamic model. In ACM SIGGRAPH 2007 papers, pages 8–es.

Chiu, H.-K., Adeli, E., Wang, B., Huang, D.-A., and Niebles, J. C. (2019). Actionagnostic human pose forecasting. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1423–1432.

Ciccone, L., Öztireli, C., and Sumner, R. W. (2019). Tangent-space optimization for interactive animation control. ACM Trans. Graph., 38(4).

Fragkiadaki, K., Levine, S., Felsen, P., and Malik, J. (2015). Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision, pages 4346–4354.

Harvey, F. G. and Pal, C. (2018). Recurrent transition networks for character locomotion. In SIGGRAPH Asia 2018 Technical Briefs, SA ’18, pages 4:1–4:4, New York, NY, USA. ACM.

Harvey, F. G., Yurick, M., Nowrouzezahrai, D., and Pal, C. (2020). Robust motion inbetweening. ACM Transactions on Graphics (TOG), 39(4):60–1.

Heck, R. and Gleicher, M. (2007). Parametric motion graphs. volume 2007, pages 129– 136.

Holden, D., Saito, J., Komura, T., and Joyce, T. (2015). Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia 2015 Technical Briefs, page 18. ACM.

Jain, A., Zamir, A. R., Savarese, S., and Saxena, A. (2016). Structural-RNN: Deep learnIn Proceedings of the ieee conference on computer ing on spatio-temporal graphs. vision and pattern recognition, pages 5308–5317.

Jordan, M. I. (1990). Attractor Dynamics and Parallelism in a Connectionist Sequential Machine, page 112–127. IEEE Press.

Lehrmann, A. M., Gehler, P. V., and Nowozin, S. (2014). Efficient nonlinear Markov models for human motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1314–1321.

Li, Y., Roblek, D., and Tagliasacchi, M. (2019). From here to there: Video inbetweening using direct 3d convolutions. ArXiv, abs/1905.10240.

Li, Z., Zhou, Y., Xiao, S., He, C., Huang, Z., and Li, H. (2017). Auto-conditioned recurrent networks for extended complex human motion synthesis. arXiv preprint arXiv:1707.05363.

Liu, L. and Hodgins, J. (2018). Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Transactions on Graphics (TOG), 37(4):1–14.

Martinez, J., Black, M. J., and Romero, J. (2017). On human motion prediction using recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2891–2900.

Meredith, M., Maddock, S., et al. (2001). Motion capture file formats explained. Department of Computer Science, University of Sheffield, 211:241–244.

Peng, X. B., Kanazawa, A., Malik, J., Abbeel, P., and Levine, S. (2018). Sfv: Reinforcement learning of physical skills from videos. ACM Trans. Graph., 37(6).

Safonova, A. and Hodgins, J. K. (2007). Construction and optimal search of interpolated motion graphs. In ACM SIGGRAPH 2007 papers, pages 106–es.

Wang, J. M., Fleet, D. J., and Hertzmann, A. (2007). Gaussian process dynamical models for human motion. IEEE transactions on pattern analysis and machine intelligence, 30(2):283–298.

Williams, R. J. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280.

Xu, Y. T., Li, Y., and Meger, D. (2019). Human motion prediction via pattern completion in latent representation space. arXiv preprint arXiv:1904.09039.

Zhou, D., Feng, X., Yang, X., Zhang, Q., Wei, X., Fang, X., and Yang, D. (2019). Human motion data editing based on a convolutional automatic encoder and manifold learning. Entertainment Computing, 30:100300.

Zhou, Y., Lu, J., Barnes, C., Yang, J., Xiang, S., et al. (2020). Generative tweening: Long-term inbetweening of 3d human motions. arXiv preprint arXiv:2005.08891.
Como Citar

Selecione um Formato
ROCHA, Fabio Neves; FREIRE, Valdinei; DELGADO, Karina Valdivia. Short-term Inbetweening of 3D Human Motions. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 18. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 583-594. DOI: