Reconhecimento de sinais estáticos de LIBRAS com Support Vector Machines usando Kinect

Leonardo Perdomo; Mozart Lemos de Siqueira

Leonardo Perdomo UNILASALLE
Mozart Lemos de Siqueira UNILASALLE

Resumo

Este artigo apresenta a experimentação realizada pelo autor em um protótipo desenvolvido pelo próprio para o reconhecimento computacional dos sinais estáticos do alfabeto manual da Língua Brasileira de Sinais (LIBRAS), capturados através do sensor de profundidade do Microsoft Kinect, utilizando técnicas de reconhecimento de padrões em imagens com classificação por Support Vector Machines (SVM) em uma abordagem multiclasse. São apresentados os resultados do protótipo e uma análise de eficiência em medições de tempo de execução e acerto no reconhecimento de sinais. Foi considerado o intervalo de distância dentro dos limites práticos (0,8m à 2,5m) do near-mode do dispositivo.

Referências

Almeida, S. G. M., Guimarães, F. G., e Ramírez, J. A. (2014). Feature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGBD sensors. Expert Systems with Applications, 41(16):7259–7271.

Brito, L. F. (1995). Por uma Gramática de Línguas de Sinais. Tempo Brasileiro, 1 edition.

Canny, J. (1986). A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6):679–698.

Carneiro, A. T. S., Cortez, P. C., e Costa, R. C. S. (2009). Reconhecimento de Gestos da LIBRAS com Classificadores Neurais a partir dos Momentos Invariantes de Hu. In Anais do 1o Congresso Regional de Design de Interação - Interaction South America 09, pages 193–198.

Cruz, L., Lucio, D., e L.Velho (2012). Kinect and RGBD Images: Challenges and Applications. In 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pages 36–49.

Csurka, G., Dance, C. R., Fan, L., Willamowski, J., e Bray, C. (2004). Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1–22.

de Quadros, R. M. (1997). Educação de surdos - A aquisição da linguagem. Artmed.

de Souza, C. R. e Pizzolato, E. B. (2013). Sign Language Recognition with Support Vector Machines and Hidden Conditional Random Fields: Going from Fingerspelling to Natural Articulated Words. In Proceedings of the 9th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 2013), pages 84–98. Springer.

de Souza, C. R., Pizzolato, E. B., e dos Santos Anjo, M. (2012a). Fingerspelling Recognition with Support Vector Machines and Hidden Conditional Random Fields: A Comparison with Neural Networks and Hidden Markov Models. In Proceedings of the 13th Ibero-American Conference on Artificial Intelligence (IBERAMIA’12), pages 561–570. Springer.

de Souza, C. R., Pizzolato, E. B., e dos Santos Anjo, M. (2012b). Recognizing Static Signs from the Brazilian Sign Language: Comparing Large-Margin Decision Directed Acyclic Graphs, Voting Support Vector Machines and Artificial Neural Networks.

Dong, C., Leu, M. C., e Yin, Z. (2015). American Sign Language Alphabet Recognition Using Microsoft Kinect. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 44–52.

FADERS (2010). Mini dicionário. Fundação de Articulação and Desenvolvimento de Políticas Públicas para Pessoas com Deficiências and Altas Habilidades no Rio Grande do Sul (FADERS).

Jiang, Y.-G., Ngo, C.-W., e Yang, J. (2007). Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR ’07, pages 494–501. ACM.

Johnson, G. P., Abram, G. D., Westing, B., Navrátil, P., e Gaither, K. (2012). Display-Cluster: An Interactive Visualization Environment for Tiled Displays. In 2012 IEEE International Conference on Cluster Computing (CLUSTER), pages 239–247.

Microsoft (2015). Kinect for Windows Programming Guide.

Mitra, S. e Acharya, T. (2007). Gesture Recognition: A Survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 37(3):311–324.

OpenCV (2011). OpenCV 2.4 documentation. OpenCV Foundation.

Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man and Cybernetics, 9(1):62–66.

Pedersoli, F., Benini, S., Adami, N., e Leonardi, R. (2014). XKin: an open source framework for hand pose and gesture recognition using kinect. In The Visual Computer, volume 30, pages 1107–1122. Springer.

Pizzolato, E. B., dos Santos Anjo, M., e Pedroso, G. (2010). Automatic Recognition of Finger Spelling for LIBRAS Based on a Two-Layer Architecture. In Proceedings of the 2010 ACM Symposium on Applied Computing (SAC ’10), pages 969–973.

Rioux-Maldague, L. e Giguere, P. (2014). Sign Language Fingerspelling Classification from Depth and Color Images Using a Deep Belief Network. In 2014 Canadian Conference on Computer and Robot Vision (CRV), pages 92–97.

Zafrulla, Z., Brashear, H., Starner, T., Hamilton, H., e Presti, P. (2011). American Sign Language Recognition with the Kinect. In Proceedings of the 13th International Conference on Multimodal Interaction (ICMI’11), pages 279–286.

Zhang, Z. (2012). Microsoft Kinect Sensor and Its Effect. IEEE MultiMedia, 19(2):4–12.

Zhu, X. e Wong, K. K. (2012). Single-frame hand gesture recognition using color and depth kernel descriptors. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pages 2989–2992. IEEE.