An Action Recognition Approach with Context and Multiscale Motion Awareness
Resumo
Despite the substantial progress made by computer vision approaches in solving image classification, object detection, and pose estimation, to name a few, activity recognition remains one of the key challenges in computer vision and pattern recognition. This paper proposes a new learning framework based on multiscale spatiotemporal graph convolution layers and a transformer architecture. Even though several approaches present high accuracy in more traditional datasets like NTU, their performance significantly drops when tested in datasets with a high level of ambiguity among activities and an unbalanced number of samples for each class. We evaluated our architecture in the challenging BABEL dataset, where we achieved state of the art in terms of accuracy (65.4%) in action classification when considering both ambiguity and class unbalance. The source code and trained models are publicly available at https://github.com/verlab/AnActionRecognitionApproach_SIBGRAPI_2022.