TensorPose: Real-time pose estimation for interactive applications

Luiz José Schirmer  Silva; Luiz Velho; Alberto Raposo; Hélio Côrtes Vieira  Lopes; Djalma Lucio Soares  da Silva

doi:10.5753/sibgrapi.2019.9814

Luiz José Schirmer Silva PUC-Rio
Luiz Velho IMPA
Alberto Raposo PUC-Rio
Hélio Côrtes Vieira Lopes PUC-Rio
Djalma Lucio Soares da Silva PUC-Rio

DOI: https://doi.org/10.5753/sibgrapi.2019.9814

Resumo

The state of the art has outstanding results for 2D multi-person pose estimation using multi-stage Deep Neural Networks in images with high accuracy. However, the use of these models on real-time applications may be impractical not just because they are computationally intensive, but also because they suffer from flicking, from the inability for capturing temporal correlations among video frames, as well as from image degradation. To tackle these problems, we expand the use of pose estimation to motion capture in interactive applications. To do so, we propose a novel deep neural network with streamlined architecture and tensor decomposition for pose estimation with improved processing time, named TensorPose. We introduce an architecture for markerless motion capture using Convolutional Neural Networks combined with sparse optical flow and Kalman Filters. We also apply this architecture in a multi-user environment, based on the Holojam framework, where it is possible to create simultaneous collaborative experiences.

Palavras-chave: Convolutional neural networks, Pose estimation, Real time applications

Referências

R. Ranjan, V.M. Patel, R. Chellappa. Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell, 41 (1) (2019), pp. 121-135

V.A. Sindagi, V.M. Patel. A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recogn Lett, 107 (2018), pp. 3-16

L. Ge. Real-time 3D hand pose estimation from depth images (2018). Ph.D. thesis

S. Schwarcz, T. Pollard. 3D human pose estimation from deep multi-view 2d pose. Proceedings of the 24th international conference on pattern recognition (ICPR), IEEE (2018), pp. 2326-2331

M. Lin, L. Lin, X. Liang, K. Wang, H. Cheng. Recurrent 3D pose sequence machines. Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 810-819

Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh. Realtime multi-person 2D pose estimation using part affinity fields. Proceedings of the CVPR (2017)

Y. Luo, J. Ren, Z. Wang, W. Sun, J. Pan, J. Liu, et al. Lstm pose machines. Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 5207-5215

D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H.-P. Seidel, W. Xu, D. Casas, C. Theobalt. Vnect: real-time 3d human pose estimation with a single rgb camera. ACM Trans Gr (TOG), 36 (4) (2017), p. 44

V. Ramakrishna, D. Munoz, M. Hebert, J.A. Bagnell, Y. Sheikh. Pose machines: Articulated pose estimation via inference machines. Proceedings of the european conference on computer vision, Springer (2014), pp. 33-47

S.-E. Wei, V. Ramakrishna, T. Kanade, Y. Sheikh. Convolutional pose machines. Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 4724-4732

J. Song, L. Wang, L. Van Gool, O. Hilliges. Thin-slicing network: A deep structured model for pose estimation in videos. Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 4220-4229

A. Shafaei, J.J. Little. Real-time human motion capture with multiple depth cameras. Proceedings of the 13th conference on computer and robot vision (CRV), IEEE (2016), pp. 24-31

D. Tome, M. Toso, L. Agapito, C. Russell. Rethinking pose in 3D: Multi-stage refinement and recovery for markerless motion capture. Proceedings of the international conference on 3D vision (3DV), IEEE (2018), pp. 474-483

A. Kanazawa, M.J. Black, D.W. Jacobs, J. Malik. End-to-end recovery of human shape and pose. Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 7122-7131

M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, M.J. Black. SMPL: a skinned multi-person linear model. ACM Trans Gr (TOG), 34 (6) (2015), p. 248

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, et al. Microsoft coco: Common objects in context
Proceedings of the European conference on computer vision, Springer (2014), pp. 740-755

Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:170404861 2017.

M. Wang, B. Liu, H. Foroosh. Factorized convolutional neural networks. Proceedings of the IEEE international conference on computer vision (2017), pp. 545-553

J. Jin, A. Dundar, E. Culurciello. Flattened convolutional neural networks for feedforward acceleration
3th international conference on learning representations (ICLR 2015) (2015)

L.R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31 (3) (1966), pp. 279-311

Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, D. Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. 3th international conference on learning representations (ICLR 2015) (2015)

T.G. Kolda, B.W. Bader. Tensor decompositions and applications. SIAM Rev, 51 (3) (2009), pp. 455-500

S. Smith, G. Karypis. Accelerating the tucker decomposition with compressed sparse tensors. Proceedings of the european conference on parallel processing, Springer (2017), pp. 653-668

A. Cichocki, R. Zdunek, A.H. Phan, S.-i. Amari. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons (2009)

L. De Lathauwer, B. De Moor, J. Vandewalle. A multilinear singular value decomposition. SIAM J Matrix Anal Appl, 21 (4) (2000), pp. 1253-1278

P. Symeonidis, A. Nanopoulos, Y. Manolopoulos. A unified framework for providing recommendations in social tagging systems based on ternary semantic analysis. IEEE Trans Knowl Data Eng, 22 (2) (2010), pp. 179-192

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks
Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 4510-4520

S. Zhou, N.X. Vinh, J. Bailey, Y. Jia, I. Davidson. Accelerating online CP decompositions for higher order tensors
Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM (2016), pp. 1375-1384

D. Tsai, M. Flagg, A. Nakazawa, J.M. Rehg. Motion coherent tracking using multi-label MRF optimization
Int J Comput Vis, 100 (2) (2012), pp. 190-202

J.-S. Kim, M. Hwangbo, T. Kanade. Realtime affine-photometric KLT feature tracker on GPU in cuda framework
Proceedings of the IEEE 12th international conference on computer vision workshops, ICCV workshops, IEEE (2009), pp. 886-893

T. Senst, J. Geistert, T. Sikora. Robust local optical flow: Long-range motions and varying illuminations
Proceedings of the IEEE international conference on image processing, IEEE, Phoenix, AZ, USA (2016), pp. 4478-4482. IEEE Catalog Number: CFP16CIP-USB ISBN: 978-1-4673-9960-9 doi:10.1109/ICIP.2016.7533207.

C.K. Chui, G. Chen, et al. Kalman filtering with real-time applications. Springer (2017)

B. Babenko, M.-H. Yang, S. Belongie. Visual tracking with online multiple instance learning. Proceedings of the CVPR (2009)

G. Bradski, A. Kaehler. Learning OpenCV: computer vision with the OpenCV library. ”O’Reilly Media, Inc.” (2008)

A. Burbano, M. Vasiliu, S. Bouaziz. 3d cameras benchmark for human tracking in hybrid distributed smart camera networks. Proceedings of the 10th international conference on distributed smart camera, ACM (2016), pp. 76-83

T. Gupta, H. Li. Indoor mapping for smart citiesan affordable approach: Using kinect sensor and zed stereo camera. Proceedings of the international conference on indoor positioning and indoor navigation (IPIN), IEEE (2017), pp. 1-8

T. Masson, K. Perlin, et al. Holo-doodle: an adaptation and expansion of collaborative holojam virtual reality
Proceedings of the ACM SIGGRAPHR Village, ACM (2017), p. 9

M. Ruggero Ronchi, P. Perona. Benchmarking and error diagnosis in multi-instance pose estimation. Proceedings of the IEEE international conference on computer vision (2017), pp. 369-378