Florescer entre Sons e Silêncios: Ferramenta de Apoio à Classificação de Vocalizações Não Verbais com Inteligência Artificial

Fernanda Floriano Silva; Alessandra Alaniz Macedo

doi:10.5753/webmedia_estendido.2025.16396

Fernanda Floriano Silva USP
Alessandra Alaniz Macedo USP

DOI: https://doi.org/10.5753/webmedia_estendido.2025.16396

Resumo

Communication, a fundamental human right, can be compromised in neurodevelopmental disorders, such as in nonverbal children with Autism Spectrum Disorder, whose vocalizations often lack intelligibility. This study explores how artificial intelligence can support phonological analysis in this context. A Brazilian Portuguese dataset was built, combining reference phonemes — processed with acoustic feature extraction and data augmentation — and vocalizations from a nonverbal child after preprocessing. Unsupervised methods revealed consistent phonological approximations, particularly in nasal categories. In the supervised analysis, samples were represented through Bag-of-Audio-Words (BoAW) combined with acoustic features, and class imbalance was addressed using SMOTE. The evaluated models included KNN, RF, MLP, SVM, and CNN. Results showed that SVM achieved the best performance in terms of phonetic/articulatory equivalences, RF demonstrated robustness in unbalanced scenarios, and CNN reached high accuracy on the validation set. Comparison with perceptual-auditory analyses by speech therapists confirmed relevant convergences. These findings highlight the feasibility of computational models as complementary resources to clinical listening, supporting therapeutic interventions and the development of child speech.

Palavras-chave: artificial intelligence, audio cluster, bag of audio words, convolutional neural networks, support vector machine, assistive tool

Referências

American Psychiatric Association. 2023. Diagnostic and Statistical Manual of Mental Disorders: DSM-5-TR (5ª, texto revisado ed.). American Psychiatric Publishing, Washington, DC.

Cristina R. F. Andrade. 2002. Fonoaudiologia: uma abordagem educacional. Lovise, São Paulo.

Brasil. 2000. Lei n° 10.098, de 19 de dezembro de 2000. [link]. Estabelece normas gerais e critérios básicos para a promoção da acessibilidade das pessoas com deficiência ou com mobilidade reduzida.

Brasil. 2015. Lei nº 13.146, de 6 de julho de 2015: Institui a Lei Brasileira de Inclusão da Pessoa com Deficiência (Estatuto da Pessoa com Deficiência). [link]. Acessado em 9 jul. 2025.

Centers for Disease Control and Prevention. 2025. CDC - Centers for Disease Control and Prevention. [link] Acesso em: 05-07-2025.

Organização das Nações Unidas. 2006. Convenção sobre os Direitos das Pessoas com Deficiência. [link]. Acessado em 9 jul. 2025.

J. Deller, D. Hsu, and L. Ferrier. 1987. Recognition of Cerebral Palsy Speech: Technical Method and a Study of Vowel Consistency. In ICASSP ’87. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 12. 1461–1464. DOI: 10.1109/ICASSP.1987.1169507

Google Cloud. 2024. Speech-to-Text Documentation. [link] Acesso em: 05-07-2025.

Instituto Brasileiro de Geografia e Estatística (IBGE). 2025. Censo Demográfico 2022 identifica 2,4 milhões de pessoas diagnosticadas com autismo no Brasil. IBGE. [link] Acessado em 7 de julho de 2025.

Eunyeoul Lee, Eunseo Yang, Jinyoung Huh, and Uran Oh. 2024. EcoScript: A Real-Time Presentation Supporting Tool using a Speech Recognition Model. In 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI). 96–101. DOI: 10.1109/IRI62200.2024.00031

Alessandra Alaniz Macedo, Vinícius de S. Gonçalves, Patrícia P. Mandrá, Vivian Motti, Renato F. Bulcão-Neto, and Kamila Rios da Hora Rodrigues. 2024. A mobile application and system architecture for online speech training in Portuguese: design, development, and evaluation of SofiaFala. Multimedia Tools and Applications (aug 2024). DOI: 10.1007/s11042-024-19980-5 Acesso em: 05-07-2025.

Michelle McGonigle et al. 2024. Evaluating Whisper ASR on the Speech of Children With and Without Developmental Delays. Journal of Speech, Language, and Hearing Research (2024).

Abhijit Mohanta and Vinay Kumar Mittal. 2022. Analysis and classification of speech sounds of children with autism spectrum disorder using acoustic features. Computer Speech & Language 72 (2022), 101287. DOI: 10.1016/j.csl.2021.101287

Davide Mulfari, Antonio Celesti, and Massimo Villari. 2021. Deep Learning Applications in Telerehabilitation Speech Therapy Scenarios. Applied Sciences 11, 3 (2021), 1177.

Davide Mulfari, Antonio Celesti, and Massimo Villari. 2022. Exploring AI-based Speaker Dependent Methods in Dysarthric Speech Recognition. In Proceedings of the 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 958–964. DOI: 10.1109/CCGrid54584.2022.00117

OpenAI. 2022. Introducing Whisper. [link] Acesso em: 13-03-2025.

Organização Pan-Americana da Saúde. 2020. Transtorno do espectro autista – OPAS/OMS. [link] Acesso em: 04-03-2025.

Helen Tager-Flusberg. 2005. Language and Communication in Autism. In Handbook of Autism and Pervasive Developmental Disorders, Fred R. Volkmar (Ed.). Wiley. Disponível em: [link].

N. S. Trubetzkoy. 1969. Principles of Phonology. University of California Press, Berkeley. Translation of Grundzüge der Phonologie (1939) by C. A. M. Baltaxe.

World Health Organization. 2023. Autism spectrum disorders. [link] Accessed: 04-2025.