Towards Generating Digital LIBRAS Signers on Mobile Devices

Wellington Silveira; Andrew Alaniz; Marina Hurtado; Luca Mendonça; Rodrigo de Bem

Wellington Silveira FURG
Andrew Alaniz FURG
Marina Hurtado FURG
Luca Mendonça FURG
Rodrigo de Bem FURG

Resumo

Sign language is an effective way to communicate with people who have some degree of hearing impairment. However, mastery of such languages is limited to a relatively small number of people. In this context, the use of assistive technologies is an excellent ally in social inclusion. Approaches based on graphics and deep learning have emerged as an effective non-intrusive way to perform sign language recognition (SLR), translation (SLT), and production (SLP). Pose transfer for sign language reenactment by digital signers is one of the tasks that can be carried out with these techniques. Despite the existence of methods addressing this problem in the literature, few of them tackle the deployment of such models on platforms accessible to real users. Therefore, in this paper, we propose the adaptation and implementation of a deep generative model for Brazilian Sign Language (LIBRAS) production on a mobile device. This effort aims toward the generation of synthetic digital LIBRAS signers which could be used to enhance real-time communication with hearing-impaired people.

Palavras-chave: Digital humans, Pose transfer, Sign language production, Deep generative models

Referências

2023. PyTorch. [link].

Sílvia Grasiella Moreira Almeida, Frederico Gadelha Guimarães, and Jaime Arturo Ramírez. 2014. Feature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGB-D sensors. Expert Systems with Applications 41, 16 (2014), 7259–7271.

Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A Efros. 2019. Everybody dance now. In Proceedings of the IEEE/CVF international conference on computer vision. 5933–5942.

Cicero Ferreira Fernandes Costa, Robson Silva de Souza, Jonilson Roque dos Santos, Bárbara Lobato dos Santos, and Marly Guimarães Fernandes Costa. 2017. A fully automatic method for recognizing hand configurations of Brazilian sign language. Research on Biomedical Engineering 33 (2017), 78–89.

Almir Cristiano. 2017. O que é Libras? [link].

Mary Jo Davidson. 2006. PAULA: A computer-based sign language tutor for hearing adults. In ITS Workshop.

Daniel B Dias, Renata CB Madeo, Thiago Rocha, Helton H Biscaro, and Sarajane M Peres. 2009. Hand movement recognition for brazilian sign language: a study using distance-based neural networks. In 2009 international joint conference on neural networks. IEEE, 697–704.

Jens Forster, Christoph Schmidt, Oscar Koller, Martin Bellgardt, and Hermann Ney. 2014. Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather.. In LREC.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (Eds.), Vol. 27. Curran Associates, Inc. [link].

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems.

Jose Jair Alves Mendes Junior, Melissa La Banca Freitas, Sergio Luiz Stevan, and Sergio Francisco Pichorim. 2019. Recognition of Libras Static Alphabet with Myo TM and Multi-Layer Perceptron. In CBEB.

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. ICLR (2013).

Christopher Kissel, Christopher Kümmel, Dennis Ritter, and Kristian Hildebrand. 2021. Pose-guided sign language video gan with dynamic lambda. arXiv preprint arXiv:2105.02742 (2021).

Shyam Krishna and Janmesh Ukey. 2021. Gan based indian sign language synthesis. In Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing. 1–8.

Alexey Larionov, Evgeniya Ustinova, Mikhail Sidorenko, David Svitov, Ilya Zakharkin, Victor Lempitsky, and Renat Bashirov. 2023. MoRF: Mobile Realistic Fullbody Avatars from a Monocular Video. arXiv preprint arXiv:2303.10275 (2023).

Kun Li, Jinsong Zhang, Yebin Liu, Yu-Kun Lai, and Qionghai Dai. 2020. PoNA: Pose-guided non-local attention for human pose transfer. IEEE Transactions on Image Processing 29 (2020), 9584–9599.

Ross E Mitchell, Travas A Young, Bellamie Bachelda, and Michael A Karchmer. 2006. How many people use ASL in the United States? Why estimates need updating. Sign Language Studies (2006).

Fernando M De Paula Neto, Lucas F Cambuim, Rafael M Macieira, Teresa B Ludermir, Cleber Zanchettin, and Edna N Barros. 2015. Extreme learning machine for real time recognition of brazilian sign language. In 2015 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 1464–1469.

Ronice Muller de Quadros and Lodenir Becker Karnopp. 2004. Língua brasileira de sinais: estudos linguísticos. Porto Alegre: Artmed (2004).

REDEDOR. 2023. Perda da Audição. [link].

Ben Saunders, Necati Cihan Camgoz, and Richard Bowden. 2022. Signing at scale: Learning to co-articulate signs for large-scale photo-realistic sign language production. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5141–5151.

Wellington Silveira, Andrew Alaniz, Marina Hurtado, Bernardo Castello da Silva, and Rodrigo de Bem. 2022. SynLibras: A Disentangled Deep Generative Model for Brazilian Sign Language Synthesis. In 2022 35th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Vol. 1. 210–215.

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc. [link].

Stephanie Stoll, Simon Hadfield, and Richard Bowden. 2020. SignSynth: DataDriven Sign Language Video Generation. In ECCV 2020 Workshop Proceedings.

Christina O Tze, Panagiotis P Filntisis, Athanasia-Lida Dimou, Anastasios Roussos, and Petros Maragos. 2022. Neural Sign Reenactor: Deep Photorealistic Sign Language Retargeting. arXiv preprint arXiv:2209.01470 (2022).

Neel Vasani, Pratik Autee, Samip Kalyani, and Ruhina Karani. 2020. Generation of indian sign language by sentence processing and generative adversarial networks. In 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS). IEEE, 1250–1255.

WHO. 2021. Deafness and hearing loss. [link].

Zhihao Zhou, Kyle Chen, Xiaoshi Li, Songlin Zhang, Yufen Wu, Yihao Zhou, Keyu Meng, Chenchen Sun, Qiang He, Wenjing Fan, et al. 2020. Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays. Nature Electronics (2020).