A Lightweight I3D-Based Approach for Real-Time Brazilian Sign Language Recognition
Resumo
The demand for accessible technologies to support the deaf and hard-of-hearing community in Brazil is significant. However, many state-of-the-art deep learning models are too computationally intensive for practical, real-time applications. This study addresses this gap by proposing an efficient, lightweight pipeline approach for isolated Brazilian Sign Language (LIBRAS) recognition. We fine-tune a pre-trained Inflated 3D ConvNet (I3D) model on the MINDS-Libras dataset using an end-to-end methodology that operates directly on raw RGB videos, circumventing the need for heavy pre-processing steps like skeleton extraction. To ensure a realistic evaluation of the model’s generalization capabilities, we adopt a strict signer-independent protocol, where test subjects are completely unseen during training. Our proposed model achieves a competitive accuracy of 92.5% and is able to perform sign recognition in real-time, demonstrating strong performance comparable to more complex architectures. This work establishes a new, robust benchmark for signer-agnostic LIBRAS recognition, highlighting that an end-to-end approach can effectively balance high accuracy with the pipeline efficiency required for deployable, real-world accessibility tools.
Palavras-chave:
Brazilian Sign Language, LIBRAS, Real-Time Recognition, Deep Learning, I3D, Signer-Independent
Referências
Carlos Eduardo G. R. Alves, Francisco De A. Boldt, and Thiago M. Paixão. 2024. Enhancing Brazilian Sign Language Recognition Through Skeleton Image Representation. In 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 1–6. DOI: 10.1109/SIBGRAPI62404.2024.10716301
Brasil. 2021. Lei n.º 14.191, de 3 de agosto de 2021. Diário Oficial da União. Altera a Lei de Diretrizes e Bases da Educação Nacional para dispor sobre educação bilíngue de surdos. Disponível em: [link].
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2021. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields . IEEE Transactions on Pattern Analysis & Machine Intelligence 43, 01 (Jan. 2021), 172–186. DOI: 10.1109/TPAMI.2019.2929257
João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4724–4733. DOI: 10.1109/CVPR.2017.502
David Vinicius da Silva, Valter Estevam, and David Menotti. 2024. Less is More: Concatenating Videos for Sign Language Translation from a Small Set of Signs. In 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 1–6. DOI: 10.1109/SIBGRAPI62404.2024.10716311
Rodrigo Zempulski Fanucchi, Arlindo Rodrigues Galvão Junior, Gabriel da Mata Marques, Lucas Brandão Rodrigues, Anderson da Silva Soares, and Telma Woerle Lima Soares. 2024. Fine-Tuning a Video Masked Autoencoder to Develop an Augmented Reality Application for Brazilian Sign Language Interpretation. In Proceedings of the 26th Symposium on Virtual and Augmented Reality. 275–278.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. DOI: 10.1109/CVPR.2016.90
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. arXiv:1705.06950 [cs.CV] [link]
Ministério da Educação do Brasil. 2023. Dia Nacional do Surdo destaca importância da inclusão e da acessibilidade. [link]. Estimates that about 5% of the Brazilian population has some form of hearing impairment..
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32 (2019). [link]
Presidência da República do Brasil. 2002. Lei nº 10.436, de 24 de abril de 2002. [link]. Recognizes the Brazilian Sign Language (Libras) as a legal means of communication and expression..
Presidência da República do Brasil. 2005. Decreto nº 5.626, de 22 de dezembro de 2005. [link]. Regulates Lei nº 10.436/2002 and mandates the provision of accessibility for deaf people in education, media, and public services..
Tamires Martins Rezende. 2021. Reconhecimento automático de sinais da Libras: desenvolvimento da base de dados MINDS-Libras e modelos de redes convolucionais. Ph.D. Dissertation. Universidade Federal de Minas Gerais.
Tamires Martins Rezende, Sílvia Grasiella Moreira Almeida, and Frederico Gadelha Guimarães. 2021. Development and validation of a Brazilian sign language database for human gesture recognition. Neural Computing and Applications 33, 16 (2021), 10449–10467.
Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. 2022. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems 35 (2022), 10078–10093.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems (NeurIPS). 5998–6008.
Andre Xavier. 2014. UMA OU DUAS? EIS A QUESTÃO!: UM ESTUDO DO PARÂMETRO NÚMERO DE MÃOS NA PRODUÇÃO DE SINAIS DA LÍNGUA BRASILEIRA DE SINAIS (LIBRAS). Ph.D. Dissertation. DOI: 10.13140/2.1.3136.1922
Mingjie Xu, Zhenyu Huang, Hailong Shen, Fei Wu, Zhaoyang Chen, Xudong Zhao, and Xiaofei Wang. 2023. A Survey on Efficient Transformer Architectures and Applications on Edge Devices. ACM Computing Surveys (CSUR) 56, 3 (2023), 1–36. DOI: 10.1145/3582043
Brasil. 2021. Lei n.º 14.191, de 3 de agosto de 2021. Diário Oficial da União. Altera a Lei de Diretrizes e Bases da Educação Nacional para dispor sobre educação bilíngue de surdos. Disponível em: [link].
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2021. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields . IEEE Transactions on Pattern Analysis & Machine Intelligence 43, 01 (Jan. 2021), 172–186. DOI: 10.1109/TPAMI.2019.2929257
João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4724–4733. DOI: 10.1109/CVPR.2017.502
David Vinicius da Silva, Valter Estevam, and David Menotti. 2024. Less is More: Concatenating Videos for Sign Language Translation from a Small Set of Signs. In 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 1–6. DOI: 10.1109/SIBGRAPI62404.2024.10716311
Rodrigo Zempulski Fanucchi, Arlindo Rodrigues Galvão Junior, Gabriel da Mata Marques, Lucas Brandão Rodrigues, Anderson da Silva Soares, and Telma Woerle Lima Soares. 2024. Fine-Tuning a Video Masked Autoencoder to Develop an Augmented Reality Application for Brazilian Sign Language Interpretation. In Proceedings of the 26th Symposium on Virtual and Augmented Reality. 275–278.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. DOI: 10.1109/CVPR.2016.90
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. arXiv:1705.06950 [cs.CV] [link]
Ministério da Educação do Brasil. 2023. Dia Nacional do Surdo destaca importância da inclusão e da acessibilidade. [link]. Estimates that about 5% of the Brazilian population has some form of hearing impairment..
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32 (2019). [link]
Presidência da República do Brasil. 2002. Lei nº 10.436, de 24 de abril de 2002. [link]. Recognizes the Brazilian Sign Language (Libras) as a legal means of communication and expression..
Presidência da República do Brasil. 2005. Decreto nº 5.626, de 22 de dezembro de 2005. [link]. Regulates Lei nº 10.436/2002 and mandates the provision of accessibility for deaf people in education, media, and public services..
Tamires Martins Rezende. 2021. Reconhecimento automático de sinais da Libras: desenvolvimento da base de dados MINDS-Libras e modelos de redes convolucionais. Ph.D. Dissertation. Universidade Federal de Minas Gerais.
Tamires Martins Rezende, Sílvia Grasiella Moreira Almeida, and Frederico Gadelha Guimarães. 2021. Development and validation of a Brazilian sign language database for human gesture recognition. Neural Computing and Applications 33, 16 (2021), 10449–10467.
Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. 2022. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems 35 (2022), 10078–10093.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems (NeurIPS). 5998–6008.
Andre Xavier. 2014. UMA OU DUAS? EIS A QUESTÃO!: UM ESTUDO DO PARÂMETRO NÚMERO DE MÃOS NA PRODUÇÃO DE SINAIS DA LÍNGUA BRASILEIRA DE SINAIS (LIBRAS). Ph.D. Dissertation. DOI: 10.13140/2.1.3136.1922
Mingjie Xu, Zhenyu Huang, Hailong Shen, Fei Wu, Zhaoyang Chen, Xudong Zhao, and Xiaofei Wang. 2023. A Survey on Efficient Transformer Architectures and Applications on Edge Devices. ACM Computing Surveys (CSUR) 56, 3 (2023), 1–36. DOI: 10.1145/3582043
Publicado
10/11/2025
Como Citar
COSTA, Victor; TAVARES, Leandro; CONCEIÇÃO, Ruhan; AGOSTINI, Luciano; SANTANA, Brenda Salenave; LEBEDEFF, Tatiana Bolivar; CORRÊA, Guilherme.
A Lightweight I3D-Based Approach for Real-Time Brazilian Sign Language Recognition. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 501-505.
DOI: https://doi.org/10.5753/webmedia.2025.16072.
