Human Pose Regression by Combining Indirect Part Detection and Contextual Information

Diogo Luvizon; Hedi Tabia; David Picard

doi:10.5753/sibgrapi.2019.9813

Diogo Luvizon Paris Seine University
Hedi Tabia Paris Seine University
David Picard Paris Seine University

DOI: https://doi.org/10.5753/sibgrapi.2019.9813

Resumo

In this paper, we tackle the problem of human pose estimation from still images, which is a very active topic, specially due to its several applications, from image annotation to human-machine interface. We use the soft-argmax function to convert feature maps directly to body joint coordinates, resulting in a fully differentiable framework. Our method is able to learn heat maps representations indirectly, without additional steps of artificial ground truth generation. Consequently, contextual information can be included to the pose predictions in a seamless way. We evaluated our method on two challenging datasets, the Leeds Sports Poses (LSP) and the MPII Human Pose datasets, reaching the best performance among all the existing regression methods.

Palavras-chave: Human pose estimation, Neural nets, Computer vision

Referências

P.F. Felzenszwalb, D.P. Huttenlocher. Pictorial structures for object recognition. Int J Comput Vis, 61 (1) (2005), pp. 55-79

Fan X., Zheng K., Lin Y., Wang S. Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2015)

Yang Y., S. Baker, A. Kannan, D. Ramanan. Recognizing proxemics in personal photos. Proceedings of the 2012 IEEE conference on computer vision and pattern recognition (2012), pp. 3522-3529

A. Toshev, C. Szegedy. DeepPose: human pose estimation via deep neural networks. Proceedings of the computer vision and pattern recognition (CVPR) (2014), pp. 1653-1660

L. Pishchulin, M. Andriluka, P.V. Gehler, B. Schiele. Strong appearance and expressive spatial models for human pose estimation. Proceedings of the international conference on computer vision (ICCV) (2013), pp. 3487-3494

L. Ladicky, P.H.S. Torr, A. Zisserman. Human pose estimation using a joint pixel-wise and part-wise formulation
Proceedings of the computer vision and pattern recognition (CVPR) (2013)

He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern Recognition (CVPR) (2016)

A. Newell, K. Yang, J. Deng. Stacked hourglass networks for human pose estimation. Proceedings of the European conference on computer vision (ECCV) (2016), pp. 483-499

A. Bulat, G. Tzimiropoulos. Human pose estimation via convolutional part heatmap regression. Proceedings of the European conference on computer vision (ECCV) (2016), pp. 717-732

Nie X., Feng J., Zuo Y., Yan S. Human pose estimation with parsing induced learner. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2018)

Chu X., Yang W., Ouyang W., Ma C., A.L. Yuille, Wang X. Multi-context attention for human pose estimation
Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2017)

Sun X., Shang J., Liang S., Wei Y. Compositional human pose regression. Proceedings of the IEEE international conference on computer vision (ICCV) (2017)

Finn C., Tan X.Y., Duan Y., T. Darrell, S. Levine, P. Abbeel. Learning visual feature spaces for robotic manipulation with deep spatial autoencoders

Yi K.M., E. Trulls, V. Lepetit, P. Fua. Lift: Learned invariant feature transform. Proceedings of the European conference on computer vision, Springer (2016), pp. 467-483

A. Nibali, He Z., S. Morgan, L. Prendergast. 3D human pose estimation with 2d marginal heatmaps. CoRR (2018)

M. Dantone, J. Gall, C. Leistner, L.V. Gool. Human pose estimation using body parts dependent joint regressors
Proceedings of the computer vision and pattern recognition (CVPR) (2013), pp. 3041-3048

C. Ionescu, F. Li, Sminchisescu C. Latent structured models for human pose estimation. Proceedings of the international conference on computer vision (ICCV) (2011), pp. 2220-2227

Luvizon D.C., D. Picard, H. Tabia. 2d/3d pose estimation and action recognition using multitask deep learning
Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2018)

T. Pfister, J. Charles, A. Zisserman. Flowing convnets for human pose estimation in videos. Proceedings of the international conference on computer vision (ICCV) (2015)

E. Insafutdinov, M. Andriluka, L. Pishchulin, S. Tang, E. Levinkov, B. Andres, et al. Arttrack: Articulated multi-person tracking in the wild. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2017)

Xiao B., Wu H., Wei Y. Simple baselines for human pose estimation and tracking. Proceedings of the European conference on computer vision (ECCV) (2018)

M. Andriluka, S. Roth, B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation
Proceedings of the computer vision and pattern recognition (CVPR) (2009), pp. 1014-1021

L. Pishchulin, M. Andriluka, P. Gehler, B. Schiele. Poselet Conditioned Pictorial Structures. Proceedings of the computer vision and pattern recognition (CVPR) (2013), pp. 588-595

T. Pfister, K. Simonyan, J. Charles, A. Zisserman. Deep convolutional neural networks for efficient pose estimation in gesture videos. Proceedings of the Asian conference on computer vision (ACCV) (2014)

J. Carreira, P. Agrawal, K. Fragkiadaki, J. Malik. Human pose estimation with iterative error feedback. Proceedings of the 2016 IEEE conference on computer vision and pattern recognition (CVPR) (2016), pp. 4733-4742

G. Rogez, P. Weinzaepfel, C. Schmid. LCR-Net: Localization-Classification-Regression for Human Pose. Proceedings of the conference on computer vision and pattern recognition (CVPR) (2017). https://hal.inria.fr/hal-01505085.

L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. Gehler, et al. DeepCut: joint subset partition and labeling for multi person pose estimation. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2016)

E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, B. Schiele. DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. Proceedings of the European conference on computer vision (ECCV) (2016)

He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. Proceedings of the IEEE conference on computer vision and pattern Recognition (CVPR) (2016)

Liang X., Gong K., Shen X., L. Lin. Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell, 41 (4) (2018), pp. 871-885

V. Belagiannis, C. Rupprecht, G. Carneiro, N. Navab. Robust optimization for deep regression. Proceedings of the International Conference on Computer Vision (ICCV) (2015), pp. 2830-2838

Z. Cao, Simon T., Wei S., Y. Sheikh. Realtime multi-person 2d pose estimation using part affinity fields. Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR) (2017), pp. 1302-1310, 10.1109/CVPR.2017.143

W. Yang, Li S., Ouyang W., Li H., Wang X. Learning feature pyramids for human pose estimation. Proceedings of the IEEE international conference on computer vision (ICCV) (2017)

Chen Y., Shen C., Wei X.-S., Liu L., Yang J. Adversarial posenet: A structure-aware convolutional network for human pose estimation. Proceedings of the IEEE international conference on computer vision (ICCV) (2017)

Chou C., Chien J., Chen H. Self adversarial training for human pose estimation. CoRR (2017)

C. Szegedy, S. Ioffe, V. Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR (2016)

M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele. 2D human pose estimation: new benchmark and state of the art analysis. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2014)

S. Johnson, M. Everingham. Clustered pose and nonlinear appearance models for human pose estimation. Proceedings of the British machine vision conference (2010)

Wei S.-E., V. Ramakrishna, T. Kanade, Y. Sheikh. Convolutional pose machines. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2016)

J. Redmon, A. Farhadi. Yolo9000: better, faster, stronger. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2017)