Multi-Loss Recurrent Residual Networks for Gesture Detection and Recognition

  • Igor L. Bastos Federal University of Minas Gerais
  • Victor Hugo C. de Melo Federal University of Minas Gerais
  • William R. Schwartz Federal University of Minas Gerais


Communication through gestures plays a relevant role in human life, in which a non-verbal language is used to propagate information among individuals. To recognize gestures, computers need to represent and interpret human appearance and motion, involving hands, arms, face, head and/or body, in a mathematical sense. Despite the high applicability in different contexts, most gesture recognition approaches in literature are not designed to deal with unsegmented videos. That is, most approaches do not temporally detect when a gesture occurs, which prevents to explore correlations between detection and recognition tasks, besides their application on real-world scenarios. In this sense, we propose the Multi-Loss Recurrent Residual Network (MLRRN), a multi-task based approach that performs both the recognition and temporal detection of gestures at once. It employs a dual loss function which takes into account the class assignment of each frame of a video to a gesture class and also determines the frame interval associated to each gesture. Our model counts with a dual input, gathering information from appearance and human pose on frames, besides bidirectional recurrent layers and residual modules. According to experiments conducted on ChaLearn Montalbano and ChaLearn ConGD datasets, our approach achieves results comparable to state-of-the-art methods considering average temporal Jaccard metric.

Palavras-chave: Gesture detection, Gesture recognition, Recurrent Networks, Multi Task


