Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints

  • Carlos A. Caetano Federal University of Minas Gerais
  • Francois Bremond Inria Sophia Antipolis, France
  • William R. Schwartz Federal University of Minas Gerais


In the last years, the computer vision research community has studied on how to model temporal dynamics in videos to employ 3D human action recognition. To that end, two main baseline approaches have been researched: (i) Recurrent Neural Networks (RNNs) with Long-Short Term Memory (LSTM); and (ii) skeleton image representations used as input to a Convolutional Neural Network (CNN). Although RNN approaches present excellent results, such methods lack the ability to efficiently learn the spatial relations between the skeleton joints. On the other hand, the representations used to feed CNN approaches present the advantage of having the natural ability of learning structural information from 2D arrays (i.e., they learn spatial relations from the skeleton joints). To further improve such representations, we introduce the Tree Structure Reference Joints Image (TSRJI), a novel skeleton image representation to be used as input to CNNs. The proposed representation has the advantage of combining the use of reference joints and a tree structure skeleton. While the former incorporates different spatial relationships between the joints, the latter preserves important spatial relations by traversing a skeleton tree with a depth-first order algorithm. Experimental results demonstrate the effectiveness of the proposed representation for 3D action recognition on two datasets achieving state-of-the-art results on the recent NTU RGB+D~120 dataset.

Palavras-chave: Convolutional Neural Network (CNN), skeleton joints, skeleton image representation, 3D action recognition


F. Han B. Reily W. Hoff H. Zhang "Space-time representation of people based on 3d skeletal data" CVIU 2017.

J. Wang Z. Liu Y. Wu J. Yuan "Mining actionlet ensemble for action recognition with depth cameras" CVPR 2012.

X. Yang Y. L. Tian "Eigenjoints-based action recognition using nave-bayes-nearest-neighbor" CVPRW 2012.

M. Zanfir M. Leordeanu C. Sminchisescu "The moving pose: An efficient 3d kinematics descriptor for low-latency action recognition and detection" ICCV 2013.

M. A. Gowayyed M. Torki M. E. Hussein M. El-Saban "Histogram of oriented displacements (hod): Describing trajectories of human joints for action recognition" IJCAI 2013.

P. Wang W. Li P. Ogunbona Z. Gao H. Zhang "Mining mid-level features for action recognition based on effective skeleton representation" DICTA 2014.

M. Devanne H. Wannous S. Berretti P. Pala M. Daoudi A. del Bimbo "3-d human action recognition by shape analysis of motion trajectories on riemannian manifold" IEEE Transactions on Cybernetics 2015.

V. Veeriah N. Zhuang G.-J. Qi "Differential recurrent neural networks for action recognition" ICCV 2015.

A. Shahroudy J. Liu T. Ng G. Wang "Ntu rgb+d: A large scale dataset for 3d human activity analysis" CVPR 2016.

S. Song C. Lan J. Xing W. Zeng J. Liu "An end-to-end spatiotemporal attention model for human action recognition from skeleton data" AAAI Conference on Artificial Intelligence 2017.

P. Zhang C. Lan J. Xing W. Zeng J. Xue N. Zheng "View adaptive recurrent neural networks for high performance human action recognition from skeleton data" ICCV 2017.

Y. Du Y. Fu L. Wang "Skeleton based action recognition with convolutional neural network" ACPR 2015.

P. Wang Z. Li Y. Hou W. Li "Action recognition based on joint trajectory maps using convolutional neural networks" MM 2016.

M. Liu C. Chen H. Liu "3d action recognition using data visualization and convolutional neural networks" ICME 2017.

Q. Ke M. Bennamoun S. An F. Sohel F. Boussaid "A new representation of skeleton sequences for 3d action recognition" CVPR 2017.

C. Li Q. Zhong D. Xie S. Pu "Skeleton-based action recognition with convolutional neural networks" ICMEW 2017.

P. Wang W. Li C. Li Y. Hou "Action recognition based on joint trajectory maps with convolutional neural networks" Knowledge-Based Systems 2018.

Z. Yang Y. Li J. Yang J. Luo "Action recognition with spatiotemporal visual attention on skeleton image sequences" TCSVT 2018.

C. Li Q. Zhong D. Xie S. Pu "Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation" IJ CAI 2018.

V. Choutas P. Weinzaepfel J. Revaud C. Schmid "Potion: Pose motion representation for action recognition" CVPR 2018.

J. Liu A. Shahroudy M. Perez G. Wang L.-Y. Duan A. C. Kot "Ntu rgb+d 1A large-scale benchmark for 3d human activity understanding" TPAMI 2019.

A. Krizhevsky I. Sutskever G. E. Hinton "Imagenet classification with deep convolutional neural networks" NIPS 2012.

K. Simonyan A. Zisserman "Very deep convolutional networks for large-scale image recognition" ICLR 2015.

K. He X. Zhang S. Ren J. Sun "Deep residual learning for image recognition" CVPR 2016.

J. Hu W. Zheng L. Ma G. Wang J. Lai J. Zhang "Early action prediction by soft regression" TPAMI 2018.

J. Hu W. Zheng J. Lai J. Zhang "Jointly learning heterogeneous features for rgb-d activity recognition" TPAMI 2017.

J. Liu A. Shahroudy D. Xu G. Wang "Spatio-temporal lstm with trust gates for 3d human action recognition" ECCV 2016.

J. Liu A. Shahroudy D. Xu A. C. Kot G. Wang "Skeleton-based action recognition using spatio-temporal lstm network with trust gates" TPAMI 2018.

J. Liu G. Wang P. Hu L. Duan A. C. Kot "Global context-aware attention lstm networks for 3d action recognition" CVPR 2017.

J. Liu A. Shahroudy G. Wang L. Duan A. Kot Chichung "Skeleton-based online action prediction using scale selection network" TPAMI 2019.

M. Liu H. Liu C. Chen "Enhanced skeleton visualization for view invariant human action recognition" Pattern Recogn. 2017.

J. Liu G. Wang L. Duan K. Abdiyeva A. C. Kot "Skeleton-based human action recognition with global context-aware attention lstm networks" TIP 2018.

Q. Ke M. Bennamoun S. An F. Sohel F. Boussaid "Learning clip representations for skeleton-based 3d action recognition" TIP 2018.

M. Liu J. Yuan "Recognizing human actions as the evolution of pose estimation maps" CVPR 2018.
Como Citar

Selecione um Formato
CAETANO, Carlos A. ; BREMOND, Francois; SCHWARTZ, William R. . Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 32. , 2019, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . DOI: