Automatic Identification of Relevant Moments in Security Force Videos Using Multimodal Analysis

Luísa Ferreira; Michel Silva

Luísa Ferreira UFV http://orcid.org/0009-0002-9587-9603
Michel Silva UFV https://orcid.org/0000-0002-2499-9619

Resumo

Due to the increasing requirement for police officers to wear body cameras, there is an increased need for algorithms that can automatically detect relevant moments in footage. This paper presents an automated system that uses audio and video inputs to highlight key events in recordings, reducing the need for operators to watch entire videos. Our method detects firearms and crowd gatherings with object detection, identifies people raising their hands with pose estimation. We also detect sound patterns such as sirens, gunshots, and shouts and use Automatic Speech Recognition to transcribe conversations and identify keywords for relevant events. Our system, evaluated with videos from YouTube channels such as PMTVSP, PoliceActivity, and Code Blue Cam, effectively identifies significant moments in security footage where agents are engaged in activities beyond routine patrol, thus avoiding the need for a human to watch the entire video to point out relevant moments.

Palavras-chave: Egocentric Vision, Video understanding, Semantic Information, Security Forces

Automatic Identification of Relevant Moments in Security Force Videos Using Multimodal Analysis

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)