Identification of Speechless Intervals in Audio Tracks using Convolutional Neural Networks

Vinícius Wanderley; Leonardo Villeth; Virgínia P. Campos; Tiago Maritan U. Araújo; Thaís G. do Rêgo

Vinícius Wanderley UFPB
Leonardo Villeth UFPB
Virgínia P. Campos UFPB
Tiago Maritan U. Araújo UFPB
Thaís G. do Rêgo UFPB

Resumo

Audio description (AD) is an accessibility resource designed to improve access for blind or low vision individuals by describing images, narrating actions and visual elements, such as scene details, some aspects of the character (eg, age, gender, clothing), among others. However, in general, an AD is only generated in sections of the video that do not contain dialogue. This is done to prevent any overlap with the dialogue in the video, which may hinder the user's understanding rather than helping it. Thus, one of the first steps in the AD generation process is to identify the speechless intervals, which are candidates to receive AD. In this work, we present a solution for automatic identification of speechless intervals in digital videos using Convolutional Neural Networks (CNNs). Our proposal is to automate this step in the AD generation process, reducing the time and effort involved for generating AD. Another alternative would be to integrate it into an automatic or semi-automatic audio description generation system. The results shows that, considering a minimum confidence level of 0.5 for the output of the classification model, the solution obtained a balanced average accuracy of 93% to identify speechless segments considering all the videos tested.

Identification of Speechless Intervals in Audio Tracks using Convolutional Neural Networks

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)