Analysis of Frequency Range Effect on the Detection of Voice Disorder Using Convolutional Neural Networks Trained on Spectogram Images

  • José Alberto Souza Paulino UFCG
  • Herman Martins Gomes UFCG
  • Leonardo Vidal Batista UFPB
  • Leonardo Wanderley Lopes UFPB

Resumo


Considering the current advancements in signal processing and machine learning (ML), non-invasive techniques for assessing vocal quality have become increasingly popular, especially with the use of spectrograms in acoustic analysis, which typically do not evaluate patterns in regions above 5kHz, either through visual inspection or using ML algorithms. This study aims to assess the relevance of different frequency ranges in classifying healthy and disordered voices using convolutional neural networks (CNN), as well as to investigate whether the combination of frequency ranges can improve classification results. To achieve this goal, spectrogram subsets were generated from 16 frequency ranges in two datasets, obtained through a bank of band-pass filters, and trained in CNN models with transfer learning. The study was conducted by first evaluating the relevance of each frequency range individually. Then, the results of the 65,536 possible combinations obtained with the 16 frequency ranges were assessed. This analysis revealed that it is possible to characterize voice pathology patterns in frequency regions above 5kHz, but the interval between 1 to 1,462 Hz is substantially better in terms of descriptive capacity in spectrograms. Additionally, it was observed that high-frequency regions, when combined with other frequency ranges, produce better classification results, improving the test accuracy from 80.53% to 82.10% in the SVD dataset and from 78.11% to 82.12% in the AVFAD dataset.
Palavras-chave: Pathology, Visualization, Machine learning algorithms, Databases, Transfer learning, Signal processing algorithms, Feature extraction, Vectors, Convolutional neural networks, Spectrogram
Publicado
30/09/2024
PAULINO, José Alberto Souza; GOMES, Herman Martins; BATISTA, Leonardo Vidal; LOPES, Leonardo Wanderley. Analysis of Frequency Range Effect on the Detection of Voice Disorder Using Convolutional Neural Networks Trained on Spectogram Images. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 37. , 2024, Manaus/AM. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 .