Spatio-temporal Localization of Actors in Video/360-Video and its Applications

  • Paulo Mendes PUC-Rio
  • Sérgio Colcher PUC-Rio


The popularity of platforms for storing and transmitting video content has created a substantial volume of video data. Given a set of actors present in a video, generating metadata with the temporal determination of the interval in which each actor is present and their spatial 2D localization in each frame in these intervals can facilitate video retrieval and recommendation. In this work, we investigate Video Face Clustering for this spatio-temporal localization of actors in videos. We first describe our method for Video Face Clustering in which we take advantage of face detection, embeddings, and clustering methods to group similar faces of actors in different frames and provide the spatio-temporal localization of them. Then, we explore, propose, and investigate innovative applications of this spatio-temporal localization in three different tasks: (i) Video Face Recognition, (ii) Educational Video Recommendation and (iii) Subtitles Positioning in 360-video.

Palavras-chave: clustering, face recognition, video recommendation, 360-video, multimedia authoring


Vidit Jain and Erik Learned-Miller. 2010. Fddb: A benchmark for face detection in unconstrained settings. Technical Report. UMass Amherst technical report.

P. Mendes, A. Busson, S. Colcher, D. Schwabe, A. Guedes, and C. Laufer. 2020. A Cluster-Matching-Based Method for Video Face Recognition. In Proceedings of the Brazilian Symposium on Multimedia and the Web. 97–104.

P. Mendes, A. Guedes, D. Moraes, R. Azevedo, and S. Colcher. 2020. An Authoring Model for Interactive 360 Videos. In 2020 IEEE International Conference on Multimedia Expo Workshops (ICMEW). 1–6.

P. Mendes, E. Vieira, A Guedes, A. Busson, and S. Colcher. 2020. A Clustering-Based Method for Automatic Educational Video Recommendation Using Deep Face-Features of Lecturers. In 2020 IEEE International Symposium on Multimedia (ISM). 158–161.

Wenyan Yang, Yanlin Qian, Joni-Kristian Kämäräinen, Francesco Cricri, and Lixin Fan. 2018. Object detection in equirectangular panorama. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2190–2195.
Como Citar

Selecione um Formato
MENDES, Paulo; COLCHER, Sérgio. Spatio-temporal Localization of Actors in Video/360-Video and its Applications. In: CONCURSO DE TESES E DISSERTAÇÕES - SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 28. , 2022, Curitiba. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 19-22. ISSN 2596-1683. DOI: