Automatic Identification of Relevant Moments in Security Force Videos Using Multimodal Analysis
Resumo
Due to the increasing requirement for police officers to wear body cameras, there is an increased need for algorithms that can automatically detect relevant moments in footage. This paper presents an automated system that uses audio and video inputs to highlight key events in recordings, reducing the need for operators to watch entire videos. Our method detects firearms and crowd gatherings with object detection, identifies people raising their hands with pose estimation. We also detect sound patterns such as sirens, gunshots, and shouts and use Automatic Speech Recognition to transcribe conversations and identify keywords for relevant events. Our system, evaluated with videos from YouTube channels such as PMTVSP, PoliceActivity, and Code Blue Cam, effectively identifies significant moments in security footage where agents are engaged in activities beyond routine patrol, thus avoiding the need for a human to watch the entire video to point out relevant moments.
Palavras-chave:
Egocentric Vision, Video understanding, Semantic Information, Security Forces
Publicado
06/11/2024
Como Citar
FERREIRA, Luísa; SILVA, Michel.
Automatic Identification of Relevant Moments in Security Force Videos Using Multimodal Analysis. In: WORKSHOP DE VISÃO COMPUTACIONAL (WVC), 19. , 2024, Rio Paranaíba/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 67-74.