3D Human Pose Estimation Based on Monocular RGB Images and Domain Adaptation

  • João Renato Ribeiro Manesco UNESP
  • Stefano Berretti University of Florence
  • Aparecido Nilceu Marana UNESP

Resumo


Human pose estimation in monocular images is a challenging problem in Computer Vision. Currently, while 2D poses find extensive applications, the use of 3D poses suffers from data scarcity due to the difficulty of acquisition. Therefore, fully convolutional approaches struggle due to limited 3D pose labels, prompting a two-step strategy leveraging 2D pose estimators, which does not generalize well to unseen poses, requiring the use of domain adaptation techniques. In this work, we introduce a novel Domain Unified Approach called DUA, which, through a unique combination of three modules on top of the pose estimator (pose converter, uncertainty estimator, and domain classifier), can improve the accuracy of 3D poses estimated from 2D poses. In the experiments carried out with SURREAL and Human3.6M datasets, our method reduced the mean per-joint position error (MPJPE) by 44.1 mm in the synthetic-to-real scenario, a quite significant result. Furthermore, our method outperformed all state-of-the-art methods in the real-to-synthetic scenario.

Referências

J. R. R. Manesco, “3D Human Pose Estimation Based on Monocular RGB Images and Domain Adaptation,” Master’s thesis, São Paulo State University, School of Sciences, 2023.

C. Zheng, W. Wu, T. Yang, S. Zhu, C. Chen, R. Liu, J. Shen, N. Kehtarnavaz, and M. Shah, “Deep learning-based human pose estimation: A survey,” arXiv preprint arXiv:2012.13392, 2020.

Y. Chen, Y. Tian, and M. He, “Monocular human pose estimation: A survey of deep learning-based methods,” Computer Vision and Image Understanding, vol. 192, p. 102897, Mar. 2020.

J. Martinez, R. Hossain, J. Romero, and J. J. Little, “A simple yet effective baseline for 3d human pose estimation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2659–2668.

G. Wei, C. Lan, W. Zeng, and Z. Chen, “View invariant 3d human pose estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4601–4610, 2019.

G. Csurka, Domain Adaptation in Computer Vision Applications, 1st ed. Springer Publishing Company, Incorporated, 2017.

Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” J. Mach. Learn. Res., vol. 17, no. 1, p. 2096–2030, Jan. 2016.

I. Sárándi, A. Hermans, and B. Leibe, “Learning 3d human pose estimation from dozens of datasets using a geometry-aware autoencoder to bridge between skeleton formats,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2956–2966.

M. Rapczyński, P. Werner, S. Handrich, and A. Al-Hamadi, “A baseline for cross-database 3d human pose estimation,” Sensors, vol. 21, no. 11, p. 3769, 2021.

A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7482–7491.

H. Li, B. Shi, W. Dai, H. Zheng, B. Wang, Y. Sun, M. Guo, C. Li, J. Zou, and H. Xiong, “Pose-oriented transformer with uncertainty-guided refinement for 2d-to-3d human pose estimation,” arXiv preprint arXiv:2302.07408, 2023.

J. N. Kundu, S. Seth, P. YM, V. Jampani, A. Chakraborty, and R. V. Babu, “Uncertainty-aware adaptation for self-supervised 3d human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20 448–20 459.

G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black, I. Laptev, and C. Schmid, “Learning from synthetic humans,” in CVPR, 2017.

T. von Marcard, R. Henschel, M. J. Black, B. Rosenhahn, and G. Pons-Moll, “Recovering accurate 3d human pose in the wild using imus and a moving camera,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 601–617.

J. R. R. Manesco, S. Berretti, and A. N. Marana, “Dua: A domain-unified approach for cross-dataset 3d human pose estimation,” Sensors, vol. 23, no. 17, p. 7312, 2023.

X. Zhang, Y. Wong, X. Wu, J. Lu, M. Kankanhalli, X. Li, and W. Geng, “Learning causal representation for training cross-domain pose estimator via generative interventions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 270–11 280.

T. Chen, C. Fang, X. Shen, Y. Zhu, Z. Chen, and J. Luo, “Anatomy-aware 3d human pose estimation with bone-based pose decomposition,” IEEE Transactions on Circuits and Systems for Video Technology, 2021.
Publicado
30/09/2024
MANESCO, João Renato Ribeiro; BERRETTI, Stefano; MARANA, Aparecido Nilceu. 3D Human Pose Estimation Based on Monocular RGB Images and Domain Adaptation. In: WORKSHOP DE TESES E DISSERTAÇÕES - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 37. , 2024, Manaus/AM. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 29-34. DOI: https://doi.org/10.5753/sibgrapi.est.2024.31641.

Artigos mais lidos do(s) mesmo(s) autor(es)