Federated Learning and Mel-Spectrograms for Physical Violence Detection in Audio


Domestic violence has increased globally as the COVID-19 pandemic combines with economic and social stresses. Some works have used traditional feature extractors to identify features from sound signals to detect physical violence. However, these extractors have not performed well at recognizing physical violence in audio. Besides, the use of Machine Learning is limited by the trade-off between collecting more data while keeping users privacy. Federated Learning (FL) is a technique that allows the creation of client-server networks, in which anonymized training result can be uploaded to a central model, responsible for aggregating and keeping the model up to date, and then distribute the updated model to the client nodes. In this paper, we proposed a FL approach to the violence detection problem in audio signals. The framework was evaluated on a newly proposed synthetic dataset, in which audio signals are represented as mel-spectrograms images, augmented with violence extracts. Thereby, it treats it as a problem of image classification using pre-trained Convolutional Neural Networks (CNN). Inception v3, MobileNet v2, ResNet152 v2 and VGG-16 architectures were evaluated, with the MobileNet architecture presenting the best performance, in terms of accuracy (71.9%), with a loss of 3.6% when compared to the non-FL setting.
SILVA, Victor E. de S.; LACERDA, Tiago B.; MIRANDA, Péricles; CÂMARA, André; CHAGAS, Amerson Riley Cabral; FURTADO, Ana Paula C.. Federated Learning and Mel-Spectrograms for Physical Violence Detection in Audio. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 12. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 379-393. ISSN 2643-6264.