Identification of Speechless Intervals in Audio Tracks using Convolutional Neural Networks

  • Vinícius Wanderley UFPB
  • Leonardo Villeth UFPB
  • Virgínia P. Campos UFPB
  • Tiago Maritan U. Araújo UFPB
  • Thaís G. do Rêgo UFPB


Audio description (AD) is an accessibility resource designed to improve access for blind or low vision individuals by describing images, narrating actions and visual elements, such as scene details, some aspects of the character (eg, age, gender, clothing), among others. However, in general, an AD is only generated in sections of the video that do not contain dialogue. This is done to prevent any overlap with the dialogue in the video, which may hinder the user's understanding rather than helping it. Thus, one of the first steps in the AD generation process is to identify the speechless intervals, which are candidates to receive AD. In this work, we present a solution for automatic identification of speechless intervals in digital videos using Convolutional Neural Networks (CNNs). Our proposal is to automate this step in the AD generation process, reducing the time and effort involved for generating AD. Another alternative would be to integrate it into an automatic or semi-automatic audio description generation system. The results shows that, considering a minimum confidence level of 0.5 for the output of the classification model, the solution obtained a balanced average accuracy of 93% to identify speechless segments considering all the videos tested.
Como Citar

Selecione um Formato
WANDERLEY, Vinícius; VILLETH, Leonardo; CAMPOS, Virgínia P.; ARAÚJO, Tiago Maritan U.; RÊGO, Thaís G. do. Identification of Speechless Intervals in Audio Tracks using Convolutional Neural Networks. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 25. , 2019, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 153-160.