Introducing a Self-Supervised, Superfeature-Based Network for Video Object Segmentation

  • Marcelo Mendonça IFBA
  • Luciano Oliveira UFBA


This work introduces a novel video object segmentation (VOS) method, called SHLS, which combines superpixels and deep learning features to construct image representations in a highly compressed latent space. The proposed approach is entirely self-supervised and is trained solely on a small dataset of unlabeled still images. The result of embedding convolutional features into the corresponding superpixel areas is ultra-compact vectors named superfeatures. The superfeatures form the basis of a memory mechanism to support the video segmentation. Through it we are able to efficiently store and retrieve past information, enhancing the segmentation of current frames. We evaluated SHLS on the DAVIS dataset, the primary benchmark for VOS, and achieved superior performance in single-object segmentation as well as competitive results in multi-object segmentation, outperforming state-of-the-art self-supervised methods that require much larger video-based datasets. Our code and trained model are publicly available at:


MENDONÇA, Marcelo; OLIVEIRA, Luciano. Introducing a Self-Supervised, Superfeature-Based Network for Video Object Segmentation. In: WORKSHOP DE TESES E DISSERTAÇÕES - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 37. , 2024, Manaus/AM. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 1-7. DOI:

