A Multimodal Frame Sampling Algorithm for Semantic Hyperlapses with Musical Alignment
Resumo
Producing visually engaging and semantically meaningful hyperlapses presents unique challenges, particularly when integrating an audio track to enhance the watching experience. This paper introduces a novel multimodal algorithm to create hyperlapses that optimize semantic content retention, visual stability, and the alignment of playback speed to the liveliness of an accompanying song. We use object detection to estimate the semantic importance of each frame and analyze the song's perceptual loudness to determine its liveliness. Then, we align the most important segments of the video—where the hyperlapse slows down—with the quieter parts of the song, signaling a shift in attention from the music to the video. Our experiments show that our approach outperforms existing methods in semantic retention and loudness-speed correlation, while maintaining comparable performance in camera stability and temporal continuity.
Palavras-chave:
Visualization, Correlation, Semantics, Music, Object detection, Hyperparameter optimization, Stability analysis, Libraries, Multiple signal classification, Pattern matching
Publicado
30/09/2024
Como Citar
NEPOMUCENO, Raphael; FERREIRA, Luísa; SILVA, Michel.
A Multimodal Frame Sampling Algorithm for Semantic Hyperlapses with Musical Alignment. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 37. , 2024, Manaus/AM.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.