# TensorPose: Real-time pose estimation for interactive applications

### Resumo

The state of the art has outstanding results for 2D multi-person pose estimation using multi-stage Deep Neural Networks in images with high accuracy. However, the use of these models on real-time applications may be impractical not just because they are computationally intensive, but also because they suffer from flicking, from the inability for capturing temporal correlations among video frames, as well as from image degradation. To tackle these problems, we expand the use of pose estimation to motion capture in interactive applications. To do so, we propose a novel deep neural network with streamlined architecture and tensor decomposition for pose estimation with improved processing time, named TensorPose. We introduce an architecture for markerless motion capture using Convolutional Neural Networks combined with sparse optical flow and Kalman Filters. We also apply this architecture in a multi-user environment, based on the Holojam framework, where it is possible to create simultaneous collaborative experiences.

**Palavras-chave:**Convolutional neural networks, Pose estimation, Real time applications

### Referências

V.A. Sindagi, V.M. Patel. A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recogn Lett, 107 (2018), pp. 3-16

L. Ge. Real-time 3D hand pose estimation from depth images (2018). Ph.D. thesis

S. Schwarcz, T. Pollard. 3D human pose estimation from deep multi-view 2d pose. Proceedings of the 24th international conference on pattern recognition (ICPR), IEEE (2018), pp. 2326-2331

M. Lin, L. Lin, X. Liang, K. Wang, H. Cheng. Recurrent 3D pose sequence machines. Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 810-819

Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh. Realtime multi-person 2D pose estimation using part affinity fields. Proceedings of the CVPR (2017)

Y. Luo, J. Ren, Z. Wang, W. Sun, J. Pan, J. Liu, et al. Lstm pose machines. Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 5207-5215

D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H.-P. Seidel, W. Xu, D. Casas, C. Theobalt. Vnect: real-time 3d human pose estimation with a single rgb camera. ACM Trans Gr (TOG), 36 (4) (2017), p. 44

V. Ramakrishna, D. Munoz, M. Hebert, J.A. Bagnell, Y. Sheikh. Pose machines: Articulated pose estimation via inference machines. Proceedings of the european conference on computer vision, Springer (2014), pp. 33-47

S.-E. Wei, V. Ramakrishna, T. Kanade, Y. Sheikh. Convolutional pose machines. Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 4724-4732

J. Song, L. Wang, L. Van Gool, O. Hilliges. Thin-slicing network: A deep structured model for pose estimation in videos. Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 4220-4229

A. Shafaei, J.J. Little. Real-time human motion capture with multiple depth cameras. Proceedings of the 13th conference on computer and robot vision (CRV), IEEE (2016), pp. 24-31

D. Tome, M. Toso, L. Agapito, C. Russell. Rethinking pose in 3D: Multi-stage refinement and recovery for markerless motion capture. Proceedings of the international conference on 3D vision (3DV), IEEE (2018), pp. 474-483

A. Kanazawa, M.J. Black, D.W. Jacobs, J. Malik. End-to-end recovery of human shape and pose. Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 7122-7131

M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, M.J. Black. SMPL: a skinned multi-person linear model. ACM Trans Gr (TOG), 34 (6) (2015), p. 248

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, et al. Microsoft coco: Common objects in context

Proceedings of the European conference on computer vision, Springer (2014), pp. 740-755

Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:170404861 2017.

M. Wang, B. Liu, H. Foroosh. Factorized convolutional neural networks. Proceedings of the IEEE international conference on computer vision (2017), pp. 545-553

J. Jin, A. Dundar, E. Culurciello. Flattened convolutional neural networks for feedforward acceleration

3th international conference on learning representations (ICLR 2015) (2015)

L.R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31 (3) (1966), pp. 279-311

Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, D. Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. 3th international conference on learning representations (ICLR 2015) (2015)

T.G. Kolda, B.W. Bader. Tensor decompositions and applications. SIAM Rev, 51 (3) (2009), pp. 455-500

S. Smith, G. Karypis. Accelerating the tucker decomposition with compressed sparse tensors. Proceedings of the european conference on parallel processing, Springer (2017), pp. 653-668

A. Cichocki, R. Zdunek, A.H. Phan, S.-i. Amari. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons (2009)

L. De Lathauwer, B. De Moor, J. Vandewalle. A multilinear singular value decomposition. SIAM J Matrix Anal Appl, 21 (4) (2000), pp. 1253-1278

P. Symeonidis, A. Nanopoulos, Y. Manolopoulos. A unified framework for providing recommendations in social tagging systems based on ternary semantic analysis. IEEE Trans Knowl Data Eng, 22 (2) (2010), pp. 179-192

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks

Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 4510-4520

S. Zhou, N.X. Vinh, J. Bailey, Y. Jia, I. Davidson. Accelerating online CP decompositions for higher order tensors

Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM (2016), pp. 1375-1384

D. Tsai, M. Flagg, A. Nakazawa, J.M. Rehg. Motion coherent tracking using multi-label MRF optimization

Int J Comput Vis, 100 (2) (2012), pp. 190-202

J.-S. Kim, M. Hwangbo, T. Kanade. Realtime affine-photometric KLT feature tracker on GPU in cuda framework

Proceedings of the IEEE 12th international conference on computer vision workshops, ICCV workshops, IEEE (2009), pp. 886-893

T. Senst, J. Geistert, T. Sikora. Robust local optical flow: Long-range motions and varying illuminations

Proceedings of the IEEE international conference on image processing, IEEE, Phoenix, AZ, USA (2016), pp. 4478-4482. IEEE Catalog Number: CFP16CIP-USB ISBN: 978-1-4673-9960-9 doi:10.1109/ICIP.2016.7533207.

C.K. Chui, G. Chen, et al. Kalman filtering with real-time applications. Springer (2017)

B. Babenko, M.-H. Yang, S. Belongie. Visual tracking with online multiple instance learning. Proceedings of the CVPR (2009)

G. Bradski, A. Kaehler. Learning OpenCV: computer vision with the OpenCV library. ”O’Reilly Media, Inc.” (2008)

A. Burbano, M. Vasiliu, S. Bouaziz. 3d cameras benchmark for human tracking in hybrid distributed smart camera networks. Proceedings of the 10th international conference on distributed smart camera, ACM (2016), pp. 76-83

T. Gupta, H. Li. Indoor mapping for smart citiesan affordable approach: Using kinect sensor and zed stereo camera. Proceedings of the international conference on indoor positioning and indoor navigation (IPIN), IEEE (2017), pp. 1-8

T. Masson, K. Perlin, et al. Holo-doodle: an adaptation and expansion of collaborative holojam virtual reality

Proceedings of the ACM SIGGRAPHR Village, ACM (2017), p. 9

M. Ruggero Ronchi, P. Perona. Benchmarking and error diagnosis in multi-instance pose estimation. Proceedings of the IEEE international conference on computer vision (2017), pp. 369-378

*In*: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 32. , 2019, Rio de Janeiro.

**Anais**[...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . DOI: https://doi.org/10.5753/sibgrapi.2019.9814.