Evaluation of 3D Convolutional Neural Networks on Isolated Brazilian Sign Language Recognition
Resumo
Isolated Sign Language Recognition plays a crucial role in advancing technologies that enhance accessibility for deaf individuals. In this work we investigate the performance of several 3D Convolutional Neural Networks on the task of ISLR in LIBRAS. Models were trained on the MINDS dataset and evaluated on an out-of-domain dataset, MALTA-LIBRAS. Experimental results indicate that while all models achieve high metrics on the training and validation data, they exhibit clear overfitting and fail to generalize effectively to unseen samples. These results highlight the challenges posed by limited data and data variability in ISLR for LIBRAS.Referências
Carreira, J. and Zisserman, A. (2017). Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308.
Delucis, M. M. (2025). Isolated Sign Language Recognition in LIBRAS. Master’s thesis, Programa de Pós-Graduação em Ciência da Computação, PUCRS.
Feichtenhofer, C. (2020). X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 203–213.
Feichtenhofer, C., Fan, H., Malik, J., and He, K. (2019). Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al. (2017). The kinetics human action video dataset. arXiv.
Li, D., Opazo, C., Yu, X., and Li, H. (2020). Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1448–1458, Los Alamitos, CA, USA. IEEE Computer Society.
Loshchilov, I. and Hutter, F. (2019). Decoupled weight decay regularization. In International Conference on Learning Representations, page 18. OpenReview.
Paula Pfeifer Moreira (2025). Quantos surdos usam libras no brasil em 2025. Accessed in: August 2025.
Rezende, T. M., Almeida, S. G. M., and Guimarães, F. G. (2021). Development and validation of a brazilian sign language database for human gesture recognition. Neural Computing and Applications, 33(16):10449–10467.
World Federation of the Deaf (2024). Our work. Accessed in: Nov 2023.
World Health Organization (2021). World Report on Hearing. World Health Organization.
Delucis, M. M. (2025). Isolated Sign Language Recognition in LIBRAS. Master’s thesis, Programa de Pós-Graduação em Ciência da Computação, PUCRS.
Feichtenhofer, C. (2020). X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 203–213.
Feichtenhofer, C., Fan, H., Malik, J., and He, K. (2019). Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al. (2017). The kinetics human action video dataset. arXiv.
Li, D., Opazo, C., Yu, X., and Li, H. (2020). Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1448–1458, Los Alamitos, CA, USA. IEEE Computer Society.
Loshchilov, I. and Hutter, F. (2019). Decoupled weight decay regularization. In International Conference on Learning Representations, page 18. OpenReview.
Paula Pfeifer Moreira (2025). Quantos surdos usam libras no brasil em 2025. Accessed in: August 2025.
Rezende, T. M., Almeida, S. G. M., and Guimarães, F. G. (2021). Development and validation of a brazilian sign language database for human gesture recognition. Neural Computing and Applications, 33(16):10449–10467.
World Federation of the Deaf (2024). Our work. Accessed in: Nov 2023.
World Health Organization (2021). World Report on Hearing. World Health Organization.
Publicado
12/11/2025
Como Citar
LAZZAROTTO, Lorenzo C.; DELUCIS, Marcelo M.; BARCELOS, Pedro T.; BARROS, Rodrigo C.; KUPSSINSKÜ, Lucas S..
Evaluation of 3D Convolutional Neural Networks on Isolated Brazilian Sign Language Recognition. In: ESCOLA REGIONAL DE APRENDIZADO DE MÁQUINA E INTELIGÊNCIA ARTIFICIAL DA REGIÃO SUL (ERAMIA-RS), 1. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 168-171.
DOI: https://doi.org/10.5753/eramiars.2025.16773.