HandArch: A deep learning architecture for LIBRAS hand configuration recognition

  • Gabriel Peixoto de Carvalho UFABC
  • André Luiz Brandão UFABC
  • Fernando Teubl Ferreira UFABC

Resumo


Despite the recent advancements in deep learning, sign language recognition persists as a challenge in computer vision due to its complexity in shape and movement patterns. Current studies that address sign language recognition treat hand pose recognition as an image classification problem. Based on this approach, we introduce HandArch, a novel architecture for realtime hand pose recognition from video to accelerate the development of sign language recognition applications. Furthermore, we present Libras91, a novel dataset of Brazilian sign language (LIBRAS) hand configurations containing 91 classes and 108,896 samples. Experimental results show that our approach surpasses the accuracy of previous studies while working in real-time on video files. The recognition accuracy of our system is 99% for the novel dataset and over 95% for other hand pose datasets.

Palavras-chave: Sign Language Recognition, LIBRAS, Deep Learning, Software Architecture, Hand Configurations

Referências

R. M. de Quadros, Lingua de Sinais Brasileira: Estudos Linguisticos. Artmed Editora, 2004.

K. M. O. Kumada, I. R. Silva, J. M. D. Martino, and V. R. R. da Nóbrega, “Desafios para a tradução de um livro didático de ciências com uso de avatares expressivos,” in I encontro do Centro de ensino, pesquisa e extensão sobre educação de surdos e Libras (Ceslibras), 2015.

M. J. Cheok, Z. Omar, and M. H. Jaward, “A review of hand gesture and sign language recognition techniques,” International Journal of Machine Learning and Cybernetics, Aug 2017.

C. F. F. C. Filho, R. S. de Souza, J. R. dos Santos, B. L. dos Santos, and M. G. F. Costa, “A fully automatic method for recognizing hand configurations of brazilian sign language,” Research on Biomedical Engineering, vol. 33, no. 1, pp. 78-89, mar 2017.

M. M. Rahman, M. S. Islam, M. H. Rahman, R. Sassi, M. W. Rivolta, and M. Aktaruzzaman, “A new benchmark on american sign language recognition using convolutional neural network,” in 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI), 2019.

A. J. Porfirio, K. L. Wiggers, L. E. Oliveira, and D. Weingaertner, “LIBRAS sign language hand configuration recognition based on 3d meshes,” in 2013 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, oct 2013.

R. Hartanto and A. Kartikasari, “Android based real-time static indonesian sign language recognition system prototype,” in 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE). IEEE, oct 2016.

I. L. Bastos, M. F. Angelo, and A. C. Loula, “Recognition of static gestures applied to brazilian sign language (libras),” in 2015 28th SIBGRAPI Conference on Graphics, Patterns and Images, 2015.

N. C. Camgöz, O. Koller, S. Hadfield, and R. Bowden, “Sign language transformers: Joint end-to-end sign language recognition and translation,” CoRR, vol. abs/2003.13830, 2020.

N. Escudeiro, P. Escudeiro, F. Soares, O. Litos, M. Norberto, and J. Lopes, “Recognition of hand configuration: A critical factor in automatic sign language translation,” in 2017 12th Iberian Conference on Information Systems and Technologies (CISTI). IEEE, jun 2017.

S. C.J. and L. A., “Signet: A deep learning based indian sign language recognition system,” in 2019 International Conference on Communication and Signal Processing (ICCSP), 2019, pp. 0596-0600.

L. K. S. Tolentino, , R. O. S. Juan, A. C. Thio-ac, M. A. B. Pamahoy, J. R. R. Forteza, and X. J. O. Garcia, “Static sign language recognition using deep learning,” International Journal of Machine Learning and Computing, vol. 9, no. 6, pp. 821-827, Dec. 2019.

H. Hosoe, S. Sako, and B. Kwolek, “Recognition of jsl finger spelling using convolutional neural networks,” in 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), 2017.

G. P. d. Carvalho, F. T. Ferreira, and A. L. Brandão, “Comparing pose recognition algorithms and introducing a new approach,” in Proceedings... Conference on Graphics, Patterns and Images, 30. (SIBGRAPI), 2017.

R. Elakkiya, “Machine learning based sign language recognition: a review and its research frontier,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 7, pp. 7205-7224, Aug. 2020.

A. Kaehler and G. Bradski, Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library. O'Reilly Media, 1 2017.

A. Géron, Hands-On Machine Learning with Scikit-Learn and Tensor-Flow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd Edition. O'Reilly Media, 2019.

S. Challa, M. R. Morelande, and D. Musicki, Fundamentals of Object Tracking. CAMBRIDGE UNIV PR, 2011.

R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” 2015.

M. Nixon, Feature Extraction and Image Processing for Computer Vision. Academic Press, 2012.

S. Bambach, S. Lee, D. J. Crandall, and C. Yu, “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions,” in The IEEE International Conference on Computer Vision (ICCV), December 2015.

LISA, “The Vision for Intelligent Vehicles and Applications (VIVA) Challenge, Laboratory for Intelligent and Safe Automobiles, UCSD,” http://cvrr.ucsd.edu/vivachallenge/index.php/hands/handdetection/, 2019, accessed: 2019-09-10.

M. Kristan, J. Matas, A. Leonardis, T. Vojir, R. Pflugfelder, G. Fernandez, G. Nebehay, F. Porikli, and L. Cehovin, “A novel performance evaluation methodology for single-target trackers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, Nov 2016.

M. Kawulok, J. Kawulok, J. Nalepa, and B. Smolka, “Self-adaptive algorithm for segmenting skin regions,” EURASIP Journal on Advances in Signal Processing, vol. 2014, no. 170, pp. 1-22, 2014.

J. P. B. Casati, D. R. Moraes, and E. L. L. Rodrigues, “Sfa: a human skin image database based on feret and ar facial images,” in Workshop de Visão Computacional - WVC, 2013.

J. Stöttinger, A. Hanbury, C. Liensberger, and R. Khan, “Skin paths for contextual flagging adult videos,” in Advances in Visual Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 303-314.

M. Oliveira, H. Chatbri, S. Little, N. E. O'Connor, and A. Sutherland, “A comparison between end-to-end approaches and feature extraction based approaches for sign language recognition,” in 2017 International Conference on Image and Vision Computing New Zealand (IVCNZ).

P. K. Pisharady, P. Vadakkepat, and A. P. Loh, “Attention based detection and recognition of hand postures against complex backgrounds,” International Journal of Computer Vision, vol. 101, Feb 2013.

F. Ronchetti, F. Quiroga, L. Lanzarini, and C. Estrebou, “Handshape recognition for argentinian sign language using probsom,” Journal of Computer Science and Technology, vol. 16, no. 1, pp. 1-5, 2016.

D. Núñez Fernández, B. Kwolek, and S. Velastín, “Hand posture recognition using convolutional neural network,” in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Cham: Springer International Publishing, 2018, pp. 441-449.

G. P. de Carvalho, “Handarch: A deep learning architecture for sign language hand configuration recognition,” https://github.com/gabrielpeixoto-cvai/handarch, 2021, accessed: 2021-11-13.
Publicado
22/11/2021
Como Citar

Selecione um Formato
CARVALHO, Gabriel Peixoto de; BRANDÃO, André Luiz; FERREIRA, Fernando Teubl. HandArch: A deep learning architecture for LIBRAS hand configuration recognition. In: WORKSHOP DE VISÃO COMPUTACIONAL (WVC), 17. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 19-24. DOI: https://doi.org/10.5753/wvc.2021.18883.