A Cluster-Based Method for Action Segmentation Using Spatio-Temporal and Positional Encoded Embeddings

  • Guilherme de A. P. Marques PUC-Rio
  • Antonio José G. Busson PUC-Rio
  • Álan Lívio V. Guedes PUC-Rio
  • Sérgio Colcher PUC-Rio

Resumo


A crucial task to overall video understanding is the recognition and localisation in time of different actions or events that are present along the scenes. To address this problem, action segmentation must be achieved. Action segmentation consists of temporally segmenting a video by labeling each frame with a specific action. In this work, we propose a novel action segmentation method that requires no prior video analysis and no annotated data. Our method involves extracting spatio-temporal features from videos in samples of 0.5s using a pre-trained deep network. Data is then transformed using a positional encoder and finally a clustering algorithm is applied with the use of a silhouette score to find the optimal number of clusters where each cluster presumably corresponds to a different single and distinguishable action. In experiments, we show that our method produces competitive results on Breakfast and Inria Instructional Videos dataset benchmarks.
Palavras-chave: Action segmentation, Action recognition, Positional encoding, I3D
Publicado
05/11/2021
Como Citar

Selecione um Formato
MARQUES, Guilherme de A. P.; BUSSON, Antonio José G.; GUEDES, Álan Lívio V.; COLCHER, Sérgio. A Cluster-Based Method for Action Segmentation Using Spatio-Temporal and Positional Encoded Embeddings. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 1. , 2021, Minas Gerais. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 181-187.