Automatic 3D animation generation for Sign Language: A case study of Sign Language Machine Translation

Luisa Martins; Stênio Ferreira; Tiago Maritan

doi:10.5753/webmedia.2025.16135

Luisa Martins UFPB
Stênio Ferreira UFPB
Tiago Maritan UFPB

DOI: https://doi.org/10.5753/webmedia.2025.16135

Resumo

Digital accessibility remains a critical challenge for the Deaf community, especially in Brazil, when it comes to universalizing Brazilian Sign Language (Libras) content through different media. Traditionally, animation pipelines for Sign Language videos, including those used in the VLibras suite, are frequently slow, costly, and limited in expressiveness. This work presents a novel approach that automates the transformation of human-generated Libras videos into expressive 3D animations using deep learning-based motion transfer techniques to be applied in VLibras workflow and extend its capability in order to improve the demanded time for implementation and cost dramatically. The system leverages recent progresses in motion capture, pose estimation, and animation retargeting to enable efficient and real-time animation of human based 3D models. The present study discusses the system architecture, automation pipeline, and the potential of value creation for the Deaf community through a survey and evaluation process. Results indicate that the approach used supported the improvement on realism, enhancement on a more immersive experience, acceleration of the development and potentially high quality compared to traditional methods. Future developments on fine-tuning the model and the generation of anthropomorphic enhanced 3D models with a better collision detection capability can enable a scalable path toward a more inclusive media content for the Deaf population in Brazil.

Palavras-chave: Animation, accessibility, motion capture, Brazilian Sign Language, avatar rendering, automation

Referências

2021. Body Posture Detection & Analysis System using MediaPipe. LearnOpenCV tutorial. Describes BlazePose topology and real-time 3D landmark detection.

2025. Pose landmark detection guide. Google AI Edge documentation. Accessed online, describes MediaPipe Pose capabilities for 33 3D landmarks.

T. N. Abu-Jamie. 2022. Classification of Sign-Language Using MobileNet. International Journal on Recent and Innovation Trends in Computing and Communication (2022).

Bader Alsharif, Easa Alalwany, Ali Ibrahim, Imad Mahgoub, and Mohammad Ilyas. 2025. Real-Time American Sign Language Interpretation Using Deep Learning and Keypoint Tracking. Sensors 25, 7 (2025). DOI: 10.3390/s25072138

Necati Cihan Camgöz, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden. 2018. RWTH-PHOENIX-Weather 2014T: Parallel corpus of sign language video, gloss and translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, USA. DOI: 10.1109/CVPR.2018.00812

Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, and Richard Bowden. 2020. Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Roberto Cavararo. 2022. Censo 2022 – Pessoas com Deficiência e Pessoas diagnosticadas com Transtorno do Espectro Autista (TEA). Instituto Brasileiro de Geografa e Estatística (IBGE), Rio de Janeiro. 211 pages. [link]

Camille Challant and Michael Filhol. 2022. A First Corpus of AZee Discourse Expressions. In Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022). 1537–1546. [link]. lrec-1.167.pdf

Richelieu R. A. Costa, Derzu Omaia, Tiago M. U. Araújo, Jóison O. Pereira, Anderson S. Coutinho, Miguel P. S. Cruz, Victoria M. Pontes, Matheus M. Barbosa, Abner S. Silva, and Guido L. S. Filho. 2023. Acessibilidade na TV 3.0 Brasileira a partir de mídias de legenda, glosa e áudio descrição. In Anais Estendidos do Workshop Futuro da TV Digital Interativa / Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia). Sociedade Brasileira de Computação, Porto Alegre, Brasil, 123–129. DOI: 10.5753/webmedia_estendido.2023.236168

Maria Fernanda Neves Silveira de Souza, Amanda Miranda Brito Araújo, Luiza Fernandes Fonseca Sandes, Daniel Antunes Freitas, Wellington Danilo Soares, Raquel Schwenck de Mello Vianna, and Árlen Almeida Duarte de Sousa. 2017. Principais dificuldades e obstáculos enfrentados pela comunidade surda no acesso à saúde: uma revisão integrativa de literatura. Revista CEFAC 19, 3 (jun 2017), 395–405. DOI: 10.1590/1982-0216201719317116

Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, John Glauert, Richard Bowden, Annelies Braffort, Christophe Collet, Petros Maragos, and François Lefebvre-Albaret. 2012. The Dicta-Sign Wiki: Enabling Web Communication for the Deaf. In Computers Helping People with Special Needs (ICCHP 2012). Lecture Notes in Computer Science, Vol. 7383. Springer, 205–212. DOI: 10.1007/978-3-642-31534-3_32

Ralph Elliott, John R. W. Glauert, Vicki Jennings, and Richard Kennaway. 2004. An Overview of the SiGML Notation and SiGMLSigning Software System. In Proceedings of the LREC 2004 Workshop on the Representation and Processing of Sign Languages. ELRA, Lisbon, Portugal, 98–104. [link]

Ralph Elliott, John R. W. Glauert, Richard Kennaway, Ian Marshall, and Éva Sáfár. 2008. Linguistic modelling and language-processing technologies for Avatarbased sign language presentation. Universal Access in the Information Society 6, 4 (2008), 375–391. DOI: 10.1007/s10209-007-0102-z

HTC Corporation. 2025. VIVERSE: Platform supporting import of VRM avatars. [link]. Accessed July 2025.

Richard Kennaway. 2015. Avatar-independent scripting for real-time gesture animation. In arXiv preprint. Introduz o SiGML, linguagem de marcação para gerar animações de sinais de forma procedural e independente do avatar.

Jong-Wook Kim, Jin-Young Choi, Eun-Ju Ha, and Jae-Ho Choi. 2023. Human Pose Estimation Using MediaPipe Pose and Optimization Method Based on a Humanoid Model. Applied Sciences 13, 4 (2023), 2700. DOI: 10.3390/app13042700

Carolina Neves, Luísa Coheur, and Hugo Nicolau. 2020. HamNoSyS2SiGML: Translating HamNoSys Into SiGML. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). ELRA, Marseille, France, 5991–5998. [link]

Aritz Núñez-Marcos, Xabier Alameda-Pineda, and Elisa Ricci. 2023. A Survey on Sign Language Machine Translation. Expert Systems with Applications 213 (2023), 118993. DOI: 10.1016/j.eswa.2022.118993

Adrián Núñez-Marcos, Olatz Perez de Viñaspre, and Gorka Labaka. 2023. A survey on Sign Language machine translation. Expert Systems with Applications 213 (2023), 118993. DOI: 10.1016/j.eswa.2022.118993

Pixiv Inc. 2025. VRoid Hub: repository and API for VRM avatars. [link]. Accessed July 2025.

Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Vassilis Athitsos, and Mohammad Sabokrou. 2024. A survey on recent advances in Sign Language Production. Expert Systems with Applications 243 (June 2024), 122846. DOI: 10.1016/j.eswa.2023.122846

Len Roberson and Sherry Shaw. 2024. Signed Language Interpreter Education Programs in North America: A Descriptive Study. Journal of Interpretation 32, 1 (2024), Article2. [link]

Ben Saunders, Necati Cihan Camgoz, and Richard Bowden. 2020. Progressive Transformers for End-to-End Sign Language Production. In Proceedings of the European Conference on Computer Vision (ECCV).

Ben Saunders, Necati Cihan Camgoz, and Richard Bowden. 2021. Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks. International Journal of Computer Vision (2021).

Nada Shahin and Leila Ismail. 2024. From rule-based models to deep learning transformers architectures for natural language processing and sign language translation systems: survey, taxonomy and performance evaluation. Artificial Intelligence Review 57, 10 (2024), 271–351. DOI: 10.1007/s10462-024-10895-z

Bruno Cassol Silva. 2023. VLIBRAS E Governo Digital: uma análise da ferramenta eletrônica e das variações linguísticas da Libras. Revista Brasileira de IA e Direito (RBIAD) 1, 2 (2023). [link] IV Mostra de Reviews, Cases e Insights do IV SIAD.

Luana Silva, Tiago Maritan U. Araújo, Maria Dayane F. Cirino Lima, Angelina S. da Silva Sales, and Yuska Paola Costa Aguiar. 2017. Avaliação de Usabilidade do Aplicativo VLibras-Móvel com Usuários Surdos. In Anais Estendidos do Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia). Sociedade Brasileira de Computação, Porto Alegre, Brasil, 123–126.

W SIMOES, L. REIS, C. ARAUJO, and J. MAIA JR. 2024. Accuracy Assessment of 2D Pose Estimation with MediaPipe for Physiotherapy Exercises. Procedia Computer Science 251 (2024), 446–453. DOI: 10.1016/j.procs.2024.11.132 15th International Conference on Emerging Ubiquitous Systems and Pervasive Networks / 14th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare EUSPN/ICTH 2024.

Sketchfab Inc. 2025. Sketchfab: Platform for 3D models, VRM avatars via WebGL/ WebXR. [link]. Accessed July 2025.

Stephanie Stoll, Necati Cihan Camgoz, Simon Hadfield, and Richard Bowden. 2018. Sign Language Production using Neural Machine Translation and Generative Adversarial Networks. In Proceedings of the British Conference on Machine Vision (BMVC). BMVA Press, Newcastle, UK.

Nina Tran, Richard E. Ladner, and Danielle Bragg. 2023. U.S. Deaf Community Perspectives on Automatic Sign Language Translation. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (New York, NY, USA) (ASSETS ’23). Association for Computing Machinery, New York, NY, USA, Article 76, 7 pages. DOI: 10.1145/3597638.3614507

United Nations. 2017. World Population Prospects: The 2017 Revision. Technical Report. United Nations Department of Economic and Social Affairs. [link] Accessed: 2023-10-09.

VRM Consortium, Inc. 2019. VRM: a humanoid 3D avatar file format based on glTF 2.0. [link]. Platform-independent avatar standard supported by UniVRM.

WHO World Health Organization. 2013. Millions of people in the world have hearing loss that can be treated or prevented. WHO Document Production Services, Geneva. [link]

Automatic 3D animation generation for Sign Language: A case study of Sign Language Machine Translation

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)