Dynamic Sign Language Recognition Based on Convolutional Neural Networks and Texture Maps

Edwin J. Escobedo  Cardenas; Lourdes Ramirez Cerna; Guillermo Camara-Chavez

doi:10.5753/sibgrapi.2019.9790

Edwin J. Escobedo Cardenas Federal University of Ouro Preto
Lourdes Ramirez Cerna Federal University of Ouro Preto
Guillermo Camara-Chavez Federal University of Ouro Preto

DOI: https://doi.org/10.5753/sibgrapi.2019.9790

Resumo

Sign language recognition (SLR) is a very challenging task due to the complexity of learning or developing descriptors to represent its primary parameters (location, movement, and hand configuration). In this paper, we propose a robust deep learning based method for sign language recognition. Our approach represents multimodal information (RGB-D) through texture maps to describe the hand location and movement. Moreover, we introduce an intuitive method to extract a representative frame that describes the hand shape. Next, we use this information as inputs to two three-stream and two-stream CNN models to learn robust features capable of recognizing a dynamic sign. We conduct our experiments on two sign language datasets, and the comparison with state-of-the-art SLR methods reveal the superiority of our approach to optimally combining texture maps and hand shape for SLR tasks.

Palavras-chave: CNN, sign language, texture maps

Referências

W. C. Stokoe, "Sign language structure: An outline of the visual communication systems of the american deaf", Journal of deaf studies and deaf education, vol. no. 1, pp. 3-2005.

J. F. Lichtenauer, E. A. Hendriks, M. J. Reinders, "Sign language recognition by combining statistical dtw and independent classification", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. no. pp. 2040-202008.

L. F. Brito, "Por uma gramatica de linguas de sinais", Tempo Brasileiro, 1995.

P. Kumar, H. Gauba, P. P. Roy, D. P. Dogra, "Coupled hmm-based multi-sensor data fusion for sign language recognition", Pattern Recognition Letters, vol. pp. 1-8, 2017.

J. Huang, W. Zhou, H. Li, W. Li, "Sign language recognition using 3d convolutional neural networks", 2015 IEEE international conference on multimedia and expo (ICME), pp. 1-6, 2015.

W. Ahmed, K. Chanda, S. Mitra, "Vision based hand gesture recognition using dynamic time warping for indian sign language", 2016 International Conference on Information Science (ICIS), pp. 120-12016.

W. Gao, G. Fang, D. Zhao, Y. Chen, "Transition movement models for large vocabulary continuous sign language recognition", Sixth IEEE International Conference on Automatic Face and Gesture Recognition 2004. Proceedings, pp. 553-52004.

R. Yang, S. Sarkar, B. Loeding, "Handling movement epenthesis and hand segmentation ambiguities in continuous sign language recognition using nested dynamic programming", IEEE transactions on pattern analysis and machine intelligence, vol. no. 3, pp. 462-42009.

C. Wang, W. Gao, S. Shan, "An approach based on phonemes to large vocabulary chinese sign language recognition", Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 411-42002.

R.-H. Liang, M. Ouhyoung, "A real-time continuous gesture recognition system for sign language", Proceedings third IEEE international conference on automatic face and gesture recognition, pp. 558-51998.

C. Vogler, D. Metaxas, Handshapes and movements: Multiple-channel american sign language recognition, pp. 247-22003.

T. Starner, A. Pentland, "Real-time american sign language recognition from video using hidden markov models", Motion-Based Recognition. Springer, pp. 227-21997.

Z. Zhang, "Microsoft kinect sensor and its effect", MultiMedia IEEE, vol. no. 2, pp. 4-2012.

F. Weichert, D. Bachmann, B. Rudak, D. Fisseler, "Analysis of the accuracy and robustness of the leap motion controller", Sensors, vol. no. 5, pp. 6380-6393, 2013.

L. Geng, X. Ma, B. Xue, H. Wu, J. Gu, Y. Li, "Combining features for chinese sign language recognition with kinect", Control & Automation (ICCA) 11th IEEE International Conference on, pp. 1393-1398, 2014.

E. Escobedo-Cardenas, G. Camara-Chavez, "A robust gesture recognition using hand local data and skeleton trajectory", Image Processing (ICIP) 2015 IEEE International Conference on, pp. 1240-122015.

Y. Zhao, Y. Liu, M. Dong, S. Bi, "Multi-feature gesture recognition based on kinect", Cyber Technology in Automation Control and Intelligent Systems (CYBER) 2016 IEEE International Conference on, pp. 392-396, 2016.

A. Hernandez-Vela, M. A. Bautista, X. Perez-Sala, V. Ponce-Lopez, S. Escalera, X. Baro, O. Pujol, C. Angulo, "Probability-based dynamic time warping and bag-of-visual-and-depth-words for human gesture recognition in rgb-d", Pattern Recognition Letters, 2013.

A. Budiman, M. I. Fanany, C. Basaruddin, "Constructive robust and adaptive os-elm in human action recognition", Industrial Automation Information and Communications Technology (IAICT) 2014 International Conference on, pp. 39-2014.

H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, R. M. Summers, "Deep convolutional neural networks for computer-aided detection: Cnn architectures dataset characteristics and transfer learning", IEEE transactions on medical imaging, vol. no. 5, pp. 1285-1298, 2016.

L. Pigou, A. van den Oord, S. Dieleman, M. van Herreweghe, J. Dambre, "Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video", International Journal of Computer Vision, pp. 1-2015.

Q. Miao, Y. Li, W. Ouyang, Z. Ma, X. Xu, W. Shi, X. Cao, Z. Liu, X. Chai, Z. Liu et al., "Multimodal gesture recognition based on the resc3d network", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3047-302017.

G. M. R. Neto, G. B., J. D. S. de Almeida, A. C. de Paiva, "Sign language recognition based on 3d convolutional neural networks", International Conference Image Analysis and Recognition, pp. 399-407, 2018.

D. Konstantinidis, K. Dimitropoulos, P. Daras, "A deep learning approach for analyzing video and skeletal features in sign language recognition", 2018 IEEE International Conference on Imaging Systems and Techniques (IST), pp. 1-6, 2018.

E. K. Kumar, P. Kishore, A. Sastry, M. T. K. Kumar, D. A. Kumar, "Training cnns for 3-d sign language recognition with color texture coded joint angular displacement maps", IEEE Signal Processing Letters, vol. no. 5, pp. 645-62018.

P. Wang, Z. Li, Y. Hou, W. Li, "Action recognition based on joint trajectory maps using convolutional neural networks", Proceedings of the 2016 ACM on Multimedia Conference, pp. 102-106, 2016.

Y. Hou, Z. Li, P. Wang, W. Li, "Skeleton optical spectra based action recognition using convolutional neural networks", IEEE Transactions on Circuits and Systems for Video Technology, 2016.

Z. Ding, P. Wang, P. O. Ogunbona, W. Li, "Investigation of different skeleton features for cnn-based 3d action recognition", Multimedia & Expo Workshops (ICMEW) 2017 IEEE International Conference on, pp. 617-62017.

C. Li, Y. Hou, P. Wang, W. Li, "Joint distance maps based action recognition with convolutional neural networks", IEEE Signal Processing Letters, vol. no. 5, pp. 624-62017.

B. Fernando, E. Gavves, J. Oramas, A. Ghodrati, T. Tuytelaars, "Rank pooling for action recognition", IEEE transactions on pattern analysis and machine intelligence, vol. no. 4, pp. 773-787, 2017.

H. Bilen, B. Fernando, E. Gavves, A. Vedaldi, "Action recognition with dynamic image networks", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

H. Bilen, B. Fernando, E. Gavves, A. Vedaldi, S. Gould, "Dynamic image networks for action recognition", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3034-302016.

Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, 2017.

A. J. Smola, B. Scholkopf, "A tutorial on support vector regression", Statistics and computing, vol. no. 3, pp. 199-22004.

S. Murali, T.-S. Choi, A. Nikzad, "Focusing techniques", Applications in Optical Science and Engineering. International Society for Optics and Photonics, 1992.

J. Chen, J. Wu, J. Konrad, P. Ishwar, "Semi-coupled two-stream fusion convnets for action recognition at extremely low resolutions", 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 139-12017.

C. Feichtenhofer, A. Pinz, A. Zisserman, "Convolutional two-stream network fusion for video action recognition", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1933-192016.

K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, "Return of the devil in the details: Delving deep into convolutional nets", British Machine Vision Conference, pp. 1-2014.

F. Ronchetti, F. Quiroga, C. Estrebou, L. Lanzarini, A. Rosete, "LsaA dataset of argentinian sign language", XX II Congreso Argentino de Ciencias de la Computacin (CACIC), 2016.

A. Vedaldi, K. Lenc, "Matconvnet - convolutional neural networks for matlab", Proceeding of the ACM Int. Conf. on Multimedia, 2015.

F. Ronchetti, F. Quiroga, C. Estrebou, L. Lanzarini, A. Rosete, "Sign languague recognition without frame-sequencing constraints: A proof of concept on the argentinian sign language", Ibero-American Conference on Artificial Intelligence, pp. 338-32016.

F. Ronchetti, "Reconocimiento de gestos dinamicos y su aplicacion al lenguaje de senas", XX Workshop de Investigadores en Ciencias de la Computacion (WICC 2018 Universidad Nacional del Nordeste)., 2018.

D. Konstantinidis, K. Dimitropoulos, P. Daras, "Sign language recognition based on hand and body skeletal data", 2018–3DTV-Conference: The True Vision-Capture Transmission and Display of 3D Video (3DTV-CON), pp. 1-4, 2018.