Automatic Time-aware Recognition of Brazilian Sign Language Based on Dynamic Time Warping
Resumo
The Brazilian Sign Language (Libras) is a crucial communication medium for the deaf community in Brazil, yet it poses significant challenges for recognition and translation tasks. This paper presents a novel approach using Fast Dynamic Time Warping (FastDTW) for recognizing Libras signs in video streams. This approach aims to bridge the communication gap between deaf and hearing individuals, enhancing accessibility and reducing social marginalization. The methodology leverages MediaPipe to extract key hand and body landmarks, which are then used to compute angular features for accurate sign recognition. Experiments were conducted on the MINDS-Libras dataset, and the results demonstrated a high recognition accuracy, outperforming traditional methods. Furthermore, when the proposed model is applied to the INCLUDE-50 dataset containing signs from a different sign language, it performs competitively without relying on deep learning techniques.
Palavras-chave:
Computer Vision, Sign Language Recognition, Gesture Recognition, Dynamic Time Warping, MediaPipe, Libras, Brazilian Sign Language
Referências
Sunusi Bala Abdullahi and Kosin Chamnongthai. 2022. American sign language words recognition using spatio-temporal prosodic and angle features: A sequential learning approach. IEEE Access 10 (2022), 15911–15923.
Ibrahim Adepoju Adeyanju, Oluwaseyi Olawale Bello, and Mutiu Adesina Adegboye. 2021. Machine learning methods for sign language recognition: A critical review and analysis. Intelligent Systems with Applications 12 (2021), 200056.
Nikolaos Arvanitis, Evangelos Sartinas, and Dimitrios Kosmopoulos. 2023. Procrustes-DTW: Dynamic Time Warping Variant for the Recognition of Sign Language Utterances. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, 1–5.
Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, and Matthias Grundmann. 2020. Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020).
Lucinda Ferreira Brito. 2010. Por uma gramática de línguas de sinais. TB-Edições Tempo Brasileiro.
Juan Cheng, Fulin Wei, Yu Liu, Chang Li, Qiang Chen, and Xun Chen. 2020. Chinese Sign Language Recognition Based on DTW-Distance-Mapping Features. Mathematical Problems in Engineering 2020, 1 (2020), 8953670.
Bruno Costa, Jean Freire, Hamilton Cavalcante, Márcia Homci, Adriana Castro, Raimundo Viégas Jr, Bianchi Meiguins, and Jefferson Morais. 2017. Fault Classification on Transmission Lines Using KNN-DTW. 174–187. DOI: 10.1007/978-3-319-62392-4_13
Diego RB da Silva, Tiago Maritan U Araujo, Thais Gaudencio do Rêgo, and Manuella Aschoff Cavalcanti Brandão. 2020. A Two-Stream Model Based on 3D Convolutional Neural Networks for the Recognition of Brazilian Sign Language in the Health Context. In Proceedings of the Brazilian Symposium on Multimedia and the Web. 5–12.
Giulia Zanon De Castro, Rubia Reis Guerra, and Frederico Gadelha Guimarães. 2023. Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps. Expert Systems with Applications 215 (2023), 119394.
Edwin Escobedo, Lourdes Ramirez, and Guillermo Camara. 2019. Dynamic Sign Language Recognition Based on Convolutional Neural Networks and Texture Maps. In 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 265–272. DOI: 10.1109/SIBGRAPI.2019.00043
Google. 2024. MediaPipe Pose. [link]
Rohit J Kate. 2016. Using dynamic time warping distances as features for improved time series classification. Data mining and knowledge discovery 30 (2016), 283–312.
Deep R. Kothadiya, Chintan M. Bhatt, T. Saba, A. Rehman, and Saeed Ali Omer Bahaj. 2023. SIGNFORMER: DeepVision Transformer for Sign Language Recognition. IEEE Access 11 (2023), 4730–4739. DOI: 10.1109/ACCESS.2022.3231130
Boon Giin Lee and Su Min Lee. 2017. Smart wearable hand device for sign language interpretation system with sensors fusion. IEEE Sensors Journal 18, 3 (2017), 1224–1232.
Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, et al. 2019. Mediapipe: A framework for building perception pipelines. (2019).
Marc Marais, Dane Brown, James Connan, and Alden Boby. 2022. Improving signer-independence using pose estimation and transfer learning for sign language recognition. In International Advanced Computing Conference. Springer, 415–428.
Syed Atif Mehdi and Yasir Niaz Khan. 2002. Sign language recognition using sensor gloves. In Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02., Vol. 5. IEEE, 2204–2206.
Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, René Vidal, and Ruzena Bajcsy. 2014. Sequence of the most informative joints (smij): A new representation for human skeletal action recognition. Journal of Visual Communication and Image Representation 25, 1 (2014), 24–38.
Luiza Maria Borges Oliveira. 2012. Cartilha do Censo 2010 – Pessoas com Deficiência. Secretaria de Direitos Humanos da Presidência da República, Brasília.
Wesley L Passos, Gabriel M Araujo, Jonathan N Gois, and Amaro A de Lima. 2021. A gait energy image-based system for Brazilian sign language recognition. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 11 (2021), 4761–4771.
Razieh Rastgoo, Kourosh Kiani, and Sergio Escalera. 2021. Sign language recognition: A deep survey. Expert Systems with Applications 164 (2021), 113794.
Tamires Martins Rezende, Sílvia Grasiella Moreira Almeida, and Frederico Gadelha Guimarães. 2021. Development and validation of a Brazilian sign language database for human gesture recognition. Neural Computing and Applications 33, 16 (01 Aug 2021), 10449–10467. DOI: 10.1007/s00521-021-05802-4
Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing 26, 1 (1978), 43–49.
Stan Salvador and Philip Chan. 2007. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11, 5 (2007), 561–580.
Maria Fernanda Neves Silveira de Souza, Amanda Miranda Brito Araújo, Luiza Fernandes Fonseca Sandes, Daniel Antunes Freitas, Wellington Danilo Soares, Raquel Schwenck de Mello Vianna, and Árlen Almeida Duarte de Sousa. 2017. Principais dificuldades e obstáculos enfrentados pela comunidade surda no acesso à saúde: uma revisão integrativa de literatura. Revista Cefac 19 (2017), 395–405.
Advaith Sridhar, Rohith Gandhi Ganesan, Pratyush Kumar, and Mitesh Khapra. 2020. Include: A large scale dataset for indian sign language recognition. In Proceedings of the 28th ACM international conference on multimedia. 1366–1375.
Barathi Subramanian, Bekhzod Olimov, Shraddha M Naik, Sangchul Kim, Kil-Houm Park, and Jeonghong Kim. 2022. An integrated mediapipe-optimized GRU model for Indian sign language recognition. Scientific Reports 12, 1 (2022), 11964.
Jimin Tan, Jianan Yang, Sai Wu, Gang Chen, and Jake Zhao. 2021. A critical look at the current train/test split in machine learning. ArXiv preprint ArXiv:2106.04525 (2021).
Akshit Tayade and Swapnil Patil. 2021. Real-time Vernacular Sign Language Recognition using MediaPipe and Machine Learning. International Journal of Research Publication and Reviews 2, 5 (2021), 9–17. DOI: 10.13140/RG.2.2.32364.03203
Ankita Wadhawan and Parteek Kumar. 2020. Deep learning-based sign language recognition system for static signs. Neural Computing and Applications 32 (2020), 7957 – 7968. DOI: 10.1007/s00521-019-04691-y
Tzu-Tsung Wong. 2015. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition 48, 9 (2015), 2839–2846. DOI: 10.1016/j.patcog.2015.03.009
Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020).
Ibrahim Adepoju Adeyanju, Oluwaseyi Olawale Bello, and Mutiu Adesina Adegboye. 2021. Machine learning methods for sign language recognition: A critical review and analysis. Intelligent Systems with Applications 12 (2021), 200056.
Nikolaos Arvanitis, Evangelos Sartinas, and Dimitrios Kosmopoulos. 2023. Procrustes-DTW: Dynamic Time Warping Variant for the Recognition of Sign Language Utterances. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, 1–5.
Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, and Matthias Grundmann. 2020. Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020).
Lucinda Ferreira Brito. 2010. Por uma gramática de línguas de sinais. TB-Edições Tempo Brasileiro.
Juan Cheng, Fulin Wei, Yu Liu, Chang Li, Qiang Chen, and Xun Chen. 2020. Chinese Sign Language Recognition Based on DTW-Distance-Mapping Features. Mathematical Problems in Engineering 2020, 1 (2020), 8953670.
Bruno Costa, Jean Freire, Hamilton Cavalcante, Márcia Homci, Adriana Castro, Raimundo Viégas Jr, Bianchi Meiguins, and Jefferson Morais. 2017. Fault Classification on Transmission Lines Using KNN-DTW. 174–187. DOI: 10.1007/978-3-319-62392-4_13
Diego RB da Silva, Tiago Maritan U Araujo, Thais Gaudencio do Rêgo, and Manuella Aschoff Cavalcanti Brandão. 2020. A Two-Stream Model Based on 3D Convolutional Neural Networks for the Recognition of Brazilian Sign Language in the Health Context. In Proceedings of the Brazilian Symposium on Multimedia and the Web. 5–12.
Giulia Zanon De Castro, Rubia Reis Guerra, and Frederico Gadelha Guimarães. 2023. Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps. Expert Systems with Applications 215 (2023), 119394.
Edwin Escobedo, Lourdes Ramirez, and Guillermo Camara. 2019. Dynamic Sign Language Recognition Based on Convolutional Neural Networks and Texture Maps. In 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 265–272. DOI: 10.1109/SIBGRAPI.2019.00043
Google. 2024. MediaPipe Pose. [link]
Rohit J Kate. 2016. Using dynamic time warping distances as features for improved time series classification. Data mining and knowledge discovery 30 (2016), 283–312.
Deep R. Kothadiya, Chintan M. Bhatt, T. Saba, A. Rehman, and Saeed Ali Omer Bahaj. 2023. SIGNFORMER: DeepVision Transformer for Sign Language Recognition. IEEE Access 11 (2023), 4730–4739. DOI: 10.1109/ACCESS.2022.3231130
Boon Giin Lee and Su Min Lee. 2017. Smart wearable hand device for sign language interpretation system with sensors fusion. IEEE Sensors Journal 18, 3 (2017), 1224–1232.
Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, et al. 2019. Mediapipe: A framework for building perception pipelines. (2019).
Marc Marais, Dane Brown, James Connan, and Alden Boby. 2022. Improving signer-independence using pose estimation and transfer learning for sign language recognition. In International Advanced Computing Conference. Springer, 415–428.
Syed Atif Mehdi and Yasir Niaz Khan. 2002. Sign language recognition using sensor gloves. In Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02., Vol. 5. IEEE, 2204–2206.
Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, René Vidal, and Ruzena Bajcsy. 2014. Sequence of the most informative joints (smij): A new representation for human skeletal action recognition. Journal of Visual Communication and Image Representation 25, 1 (2014), 24–38.
Luiza Maria Borges Oliveira. 2012. Cartilha do Censo 2010 – Pessoas com Deficiência. Secretaria de Direitos Humanos da Presidência da República, Brasília.
Wesley L Passos, Gabriel M Araujo, Jonathan N Gois, and Amaro A de Lima. 2021. A gait energy image-based system for Brazilian sign language recognition. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 11 (2021), 4761–4771.
Razieh Rastgoo, Kourosh Kiani, and Sergio Escalera. 2021. Sign language recognition: A deep survey. Expert Systems with Applications 164 (2021), 113794.
Tamires Martins Rezende, Sílvia Grasiella Moreira Almeida, and Frederico Gadelha Guimarães. 2021. Development and validation of a Brazilian sign language database for human gesture recognition. Neural Computing and Applications 33, 16 (01 Aug 2021), 10449–10467. DOI: 10.1007/s00521-021-05802-4
Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing 26, 1 (1978), 43–49.
Stan Salvador and Philip Chan. 2007. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11, 5 (2007), 561–580.
Maria Fernanda Neves Silveira de Souza, Amanda Miranda Brito Araújo, Luiza Fernandes Fonseca Sandes, Daniel Antunes Freitas, Wellington Danilo Soares, Raquel Schwenck de Mello Vianna, and Árlen Almeida Duarte de Sousa. 2017. Principais dificuldades e obstáculos enfrentados pela comunidade surda no acesso à saúde: uma revisão integrativa de literatura. Revista Cefac 19 (2017), 395–405.
Advaith Sridhar, Rohith Gandhi Ganesan, Pratyush Kumar, and Mitesh Khapra. 2020. Include: A large scale dataset for indian sign language recognition. In Proceedings of the 28th ACM international conference on multimedia. 1366–1375.
Barathi Subramanian, Bekhzod Olimov, Shraddha M Naik, Sangchul Kim, Kil-Houm Park, and Jeonghong Kim. 2022. An integrated mediapipe-optimized GRU model for Indian sign language recognition. Scientific Reports 12, 1 (2022), 11964.
Jimin Tan, Jianan Yang, Sai Wu, Gang Chen, and Jake Zhao. 2021. A critical look at the current train/test split in machine learning. ArXiv preprint ArXiv:2106.04525 (2021).
Akshit Tayade and Swapnil Patil. 2021. Real-time Vernacular Sign Language Recognition using MediaPipe and Machine Learning. International Journal of Research Publication and Reviews 2, 5 (2021), 9–17. DOI: 10.13140/RG.2.2.32364.03203
Ankita Wadhawan and Parteek Kumar. 2020. Deep learning-based sign language recognition system for static signs. Neural Computing and Applications 32 (2020), 7957 – 7968. DOI: 10.1007/s00521-019-04691-y
Tzu-Tsung Wong. 2015. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition 48, 9 (2015), 2839–2846. DOI: 10.1016/j.patcog.2015.03.009
Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020).
Publicado
14/10/2024
Como Citar
ARCANJO, Lucas de S.; COELHO, Lucas F.; GUIMARÃES, Silvio Jamil F.; PATROCÍNIO JR, Zenilton K. G. do; CARDOSO, Leonardo Vilela.
Automatic Time-aware Recognition of Brazilian Sign Language Based on Dynamic Time Warping. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 30. , 2024, Juiz de Fora/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 72-79.
DOI: https://doi.org/10.5753/webmedia.2024.243245.