Facial Expression Analysis in Brazilian Sign Language for Sign Recognition
Resumo
Sign language is one of the main forms of communication used by the deaf community. The language’s smallest unit, a “sign”, comprises a series of intricate manual and facial gestures. As opposed to speech recognition, sign language recognition (SLR) lags behind, presenting a multitude of open challenges because this language is visual-motor. This paper aims to explore two novel approaches in feature extraction of facial expressions in SLR, and to propose the use of Random Forest (RF) in Brazilian SLR as a scalable alternative to Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN). Results show that RF’s performance is at least comparable to SVM’s and k-NN’s, and validate non-manual parameter recognition as a consistent step towards SLR.
Referências
Abdullah, M. F. A., Sayeed, M. S., Muthu, K. S., Bashier, H. K., Azman, A., and Ibrahim, S. Z. (2014). Face recognition with symmetric local graph structure (SLGS). Expert Systems with Applications, 41(14):6131–6137.
Almeida, S. G. M., Guimar˜aes, F. G., and Ramírez, J. A. (2014). Feature extraction in brazilian sign language recognition based on phonological structure and using RGB-d sensors. Expert Systems with Applications, 41(16):7259–7271.
Boulesteix, A.-L., Janitza, S., Kruppa, J., and K¨onig, I. R. (2012). Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6):493–507.
Brasil (2002). Lei no 10.436, de 24 de abril de 2002.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
Capovilla, F. C. (2017). Dicionário da Língua de Sinais do Brasil. A Libras em Suas M˜aos - 3 Volumes. Edusp.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3):273–297.
de Assis Silva, C. A. (2012). Igreja católica e surdez: território, associação e representação política. Religi˜ao & Sociedade, 32(1):13–38.
de Paula Neto, F. M., Cambuim, L. F., Macieira, R. M., Ludermir, T. B., Zanchettin, C., and Barros, E. N. (2015). Extreme learning machine for real time recognition of brazilian sign language. In 2015 IEEE International Conference on Systems, Man, and Cybernetics. IEEE.
Dias, D. B., Madeo, R. C. B., Rocha, T., Biscaro, H. H., and Peres, S. M. (2009). Hand movement recognition for brazilian sign language: A study using distance-based neural networks. In 2009 International Joint Conference on Neural Networks. IEEE.
Du, S., Tao, Y., and Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15):E1454–E1462.
Elliott, A. and Woodward, W. (2007). Statistical Analysis Quick Reference Guidebook. SAGE Publications, Inc.
Escobedo-Cardenas, E. and Camara-Chavez, G. (2015). A robust gesture recognition using hand local data and skeleton trajectory. In 2015 IEEE International Conference on Image Processing (ICIP). IEEE.
Filho, C. F. F. C., de Souza, R. S., dos Santos, J. R., dos Santos, B. L., and Costa, M. G. F. (2017). A fully automatic method for recognizing hand configurations of brazilian sign language. Research on Biomedical Engineering, 33(1):78–89.
Freitas, F. A., Peres, S. M., Lima, C. A. M., and Barbosa, F. V. (2017). Grammatical facial expression recognition in sign language discourse: a study at the syntax level. Information Systems Frontiers, 19(6):1243–1259.
Genuer, R., Poggi, J.-M., Tuleau-Malot, C., and Villa-Vialaneix, N. (2017). Random forests for big data. Big Data Research, 9:28–46.
Gesser, A. (2009). LIBRAS?: Que língua é essa?: crenças e preconceitos em torno da língua de sinais e da realidade surda. Parábola Editorial, S˜ao Paulo.
Gross, R. (2005). Face databases. In S.Li, A., editor, Handbook of Face Recognition. Springer, New York.
Hinton, G., Deng, L., Yu, D., Dahl, G., rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97.
Hsu, C., Chang, C., and Lin, C. (2016). A practical guide to support vector classification.
Jung, H., Lee, S., Yim, J., Park, S., and Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE.
Laborit, E. (1998). The cry of the gull. Gallaudet University Press, Washington, DC.
Landar, H. and Stokoe, W. C. (1961). Sign language structure: An outline of the visual communication systems of the american deaf. Language, 37(2):269.
Ließ, M., Glaser, B., and Huwe, B. (2012). Uncertainty in the spatial prediction of soil texture. Geoderma, 170:70–79.
López, G., Quesada, L., and Guerrero, L. A. (2017). Alexa vs. siri vs. cortana vs. google assistant: A comparison of speech-based natural user interfaces. In Advances in Intelligent Systems and Computing, pages 241–250. Springer International Publishing.
Meyer, D. and Wien, T. U. (2001). Support vector machines. the interface to libsvm in package e1071. online-documentation of the package e1071 for r.
Pariwat, T. and Seresangtakul, P. (2017). Thai finger-spelling sign language recognition using global and local features with SVM. In 2017 9th International Conference on Knowledge and Smart Technology (KST). IEEE.
Patrick, E. and Fischer, F. (1970). A generalized k-nearest neighbor rule. Information and control, 16(2):128 – 152.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Pigou, L., Dieleman, S., Kindermans, P.-J., and Schrauwen, B. (2015). Sign language recognition using convolutional neural networks. In Computer Vision - ECCV 2014 Workshops, pages 572–578. Springer International Publishing.
Porfirio, A. J., Wiggers, K. L., Oliveira, L. E., and Weingaertner, D. (2013). LIBRAS sign language hand configuration recognition based on 3d meshes. In 2013 IEEE International Conference on Systems, Man, and Cybernetics. IEEE.
Pu, X., Fan, K., Chen, X., Ji, L., and Zhou, Z. (2015). Facial expression recognition from image sequences using twofold random forest classifier. Neurocomputing, 168:1173–1180.
Rao, G. A., Kishore, P. V. V., Sastry, A. S. C. S., Kumar, D. A., and Kumar, E. K. (2017).
Selfie continuous sign language recognition with neural network classifier. In Proceedings of 2nd International Conference on Micro-Electronics, Electromagnetics and Telecommunications, pages 31–40. Springer Singapore.
Rezende, T. M., de Castro, C. L., and Almeida, S. G. M. (2016). An approach for brazilian sign language (bsl) recognition based on facial expression and k-nn classifier. In Fábio A. M. Cappabianco, Fábio A. Faria, J. A. and K¨orting, T. S., editors, Conference on Graphics, Patterns and Images (SIBGRAPI ’16). Sociedade Brasileira de Computação.
Uddin, M. A. and Chowdhury, S. A. (2016). Hand sign language recognition for bangla alphabet using support vector machine. In 2016 International Conference on Innovations in Science, Engineering and Technology (ICISET). IEEE.
Uddin, M. T. (2015). An ada-random forests based grammatical facial expressions recognition approach. In 2015 International Conference on Informatics, Electronics & Vision (ICIEV). IEEE.
Yang, H.-D. and Lee, S.-W. (2011). Combination of manual and non-manual features for sign language recognition based on conditional random field and active appearance model. In 2011 International Conference on Machine Learning and Cybernetics. IEEE.
Yang, H.-D. and Lee, S.-W. (2013). Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine. Pattern Recognition Letters, 34(16):2051–2056.
Yu, Z. and Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction - ICMI '15. ACM Press.
Zeng, Z., Pantic, M., Roisman, G., and Huang, T. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39–58.
Zhang, C., Liu, C., Zhang, X., and Almpanidis, G. (2017). An up-to-date comparison of state-of-the-art classification algorithms. Expert Systems with Applications, 82:128–150.