Pointing Gesture Recognition from 3D Human Skeleton Data
Resumo
Gesture recognition in Human-Machine Interaction (HMI) refers to the automatic detection of human gestures, assigning semantic meaning to physical movements and enabling interaction with computers, robots, or the analysis of human behavior. This work proposes a method for static gesture recognition, focusing specifically on the “pointing” gesture, based on 3D human skeletons reconstructed through a multi-camera system. The objective is to automatically detect the presence or absence of the gesture using spatial pose data derived from the three-dimensional reconstruction of the human body from multiple viewpoints. Following manual annotation and segmentation of the gesture sequences, structural features were extracted from the skeletons. A normalization process was applied, which performs translation to the origin and rotation to align the skeleton with the X-axis. Classical supervised machine learning models were then employed to classify body poses: Logistic Regression, Random Forest, and Decision Tree. Experiments were carried out using Leave-One-Subject-Out (LOSO) cross-validation. The results demonstrate the viability of the proposed approach for applications in intelligent environments, where recognizing the pointing gesture could be used to indicate goals, highlight objects of interest, or define target positions for mobile robots.
Referências
C. C. Santos, L. C. Cosmi, A. P. d. Carmo, J. Samatelo, J. Santos-Victor, and R. F. Vassallo, “Reconhecimento online de gestos dinâmicos para ambientes interacionais multicâmeras,” in XV Simpósio Brasileiro de Automação Inteligente - SBAI 2021, 01 2021.
A. Osipov and M. Ostanin, “Real-time static custom gestures recognition based on skeleton hand,” in 2021 International Conference ”Nonlinearity, Information and Robotics” (NIR), 2021, pp. 1–4.
L. C. Cosmi Filho, M. D. d. Oliveira, M. N. Lucas, L. F. Follador, and R. F. Vassallo, “An approach based on a programmable intelligent space for human-robot interaction,” in 2024 Latin American Robotics Symposium (LARS), 2024, pp. 1–6.
V. Lorentz, M. Weiss, K. Hildebrand, and I. Boblan, “Pointing gestures for human-robot interaction with the humanoid robot digit,” in 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2023, pp. 1886–1892.
M. Čorňák, M. Tölgyessy, and P. Hubinský, “Innovative collaborative method for interaction between a human operator and robotic manipulator using pointing gestures,” Applied Sciences, vol. 12, no. 1, 2022. [Online]. Available: [link]
A. C. S. Medeiros, P. Ratsamee, Y. Uranishi, T. Mashita, and H. Takemura, “Human-drone interaction: Using pointing gesture to define a target object,” in Human-Computer Interaction. Multimodal and Natural Interaction, M. Kurosu, Ed. Cham: Springer International Publishing, 2020, pp. 688–705.
H. Chu, J.-H. Lee, Y.-C. Lee, C.-H. Hsu, J.-D. Li, and C.-S. Chen, “Part-aware measurement for robust multi-view multi-human 3d pose estimation and tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1472–1481.
G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: [link]
T. Jiang, P. Lu, L. Zhang, N. Ma, R. Han, C. Lyu, Y. Li, and K. Chen, “RTMPose: Real-time multi-person pose estimation based on mmpose,” ArXiv, vol. abs/2303.07399, 2023. [Online]. Available: [link]
L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32, 2001.
D. W. Hosmer Jr, S. Lemeshow, and R. X. Sturdivant, Applied logistic regression. John Wiley & Sons, 2013.
L. Breiman, J. Friedman, R. A. Olshen, and C. J. Stone, Classification and regression trees. Chapman and Hall/CRC, 2017.
Q. F. Gronau and E.-J. Wagenmakers, “Limitations of bayesian leave-one-out cross-validation for model selection,” Computational brain & behavior, vol. 2, no. 1, pp. 1–11, 2019.
