Aprendizado de máquina no apoio à transcrição e classificação da fala gaguejada: uma revisão sistemática da literatura
Resumo
Na área da Saúde, a identificação da gagueira é realizada manualmente por fonoaudiólogos para fins diagnósticos. Neste contexto, o Aprendizado de Máquina (AM) pode ser uma ferramenta valiosa para apoiar esta atividade por meio, por exemplo, da automatização da transcrição de falas gaguejadas e da classificação de disfluências. Este trabalho apresenta uma revisão sistemática da literatura que busca investigar como os trabalhos têm provido ou utilizado métodos de AM para transcrição e classificação da fala gaguejada. Busca-se também identificar até que ponto os trabalhos têm sido aplicados no apoio efetivo à prática clínica do fonoaudiólogo. A análise inclui um levantamento de conjuntos de dados, idiomas, critérios diagnósticos e desafios enfrentados na identificação da gagueira.Referências
Adepu, Y., Boga, V. R., & Sairam, U. (2020, November). Interviewee performance analyzer using facial emotion recognition and speech fluency recognition. In 2020 IEEE International Conference for Innovation in Technology (INOCON) (pp. 1-5). IEEE.
Al-Banna, A. K., Edirisinghe, E., Fang, H., & Hadi, W. (2022). Stuttering disfluency detection using machine learning approaches. Journal of Information & Knowledge Management, 21(02), 2250020.
Alharbi, S., Hasan, M., Simons, A. J., Brumfitt, S., & Green, P. (2020). Sequence labeling to detect stuttering events in read speech. Computer Speech & Language, 62, 101052.
Almeida, R. J. S., Fernandes, D. Y. S., Oliveira, L. P., & Correia, D. V. (2023). Desafios e oportunidades na integração do ambiente clínico e digital para apoio ao diagnóstico da gagueira. Computação Brasil, (51), 37-41.
Ambrose, N. G., & Yairi, E. (1999). Normative disfluency data for early childhood stuttering. Journal of Speech, Language, and Hearing Research, 42(4), 895-909.
American Psychiatric Association. (2022). Childhood-Onset Fluency Disorder (Stuttering). In Diagnostic and statistical manual of mental disorders (5th ed.).
Andrade, C. D., Befi-Lopes, D. M., Fernandes, F. D. M., & Wertzner, H. F. (2004). ABFW: teste de linguagem infantil nas áreas de fonologia, vocabulário, fluência e pragmática. São Paulo: Pró-Fono.
Arbajian, P., Hajja, A., Raś, Z. W., & Wieczorkowska, A. A. (2019). Effect of speech segment samples selection in stutter block detection and remediation. Journal of Intelligent Information Systems, 53, 241-264.
Asci, F., Marsili, L., Suppa, A., Saggio, G., Michetti, E., Di Leo, P., & Costantini, G. (2023).
Acoustic analysis in stuttering: a machine-learning study. Frontiers in Neurology, 14, 1169707.
Barrett, L., Hu, J., & Howell, P. (2022). Systematic review of machine learning approaches for detecting developmental stuttering. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1160-1172.
Bayerl, S. P., Wagner, D., Nöth, E., & Riedhammer, K. (2022). Detecting dysfluencies in stuttering therapy using wav2vec 2.0. arXiv preprint arXiv:2204.03417.
Bayerl, S. P., Gerczuk, M., Batliner, A., Bergler, C., Amiriparian, S., Schuller, B., ... & Riedhammer, K. (2023). Classification of stuttering–The ComParE challenge and beyond. Computer Speech & Language, 81, 101519.
Bloodstein, O., Ratner, N. B., & Brundage, S. B. (2021). A handbook on stuttering. Plural Publishing.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215.
Chollet, F. (2021). Deep learning with Python. Simon and Schuster.
Deepak, G., Surya, D., Trivedi, I., Kumar, A., & Lingampalli, A. (2022). An artificially intelligent approach for automatic speech processing based on triune ontology and adaptive tribonacci deep neural networks. Computers & Electrical Engineering, 98, 107736.
Deng, J., Xie, X., Wang, T., Cui, M., Xue, B., Jin, Z., ... & Meng, H. (2022). Confidence score based conformer speaker adaptation for speech recognition. arXiv preprint arXiv:2206.12045.
Filipowicz, P., & Kostek, B. (2023). Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set. Applied Sciences, 13(10), 6192.
Fox, C. B., Israelsen-Augenstein, M., Jones, S., & Gillam, S. L. (2021). An evaluation of expedited transcription methods for school-age children's narrative language: automatic speech recognition and real-time transcription. Journal of Speech, Language, and Hearing Research, 64(9), 3533-3548.
Gupta, S., Shukla, R. S., Shukla, R. K., & Verma, R. (2020). Deep learning bidirectional LSTM based detection of prolongation and repetition in stuttered speech using weighted MFCC. International Journal of Advanced Computer Science and Applications, 11(9).
Howell, P., & Sackin, S. (1995, August). Automatic recognition of repetitions and prolongations in stuttered speech. In Proceedings of the first World Congress on fluency disorders (Vol. 2, pp. 372-374). Nijmegen, The Netherlands: University Press Nijmegen.
Howell, Peter & Davis, Stephen & Bartrip, Jon. (2009). The University College London Archive of Stuttered Speech (UCLASS). Journal of speech, language, and hearing research: JSLHR. 52. 556-69. 10.1044/1092-4388(07-0129).
Jegan, R., & Jayagowri, R. (2022). MFCC and texture descriptors based stuttering dysfluencies classification using extreme learning machine. International Journal of Advanced Computer Science and Applications, 13(8).
Jouaiti, M., & Dautenhahn, K. (2022, May). Dysfluency classification in stuttered speech using deep learning for real-time applications. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6482-6486). IEEE.
Kitchenham, B., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001, Keele University and Durham University Joint Report.
Kourkounakis, T., Hajavi, A., & Etemad, A. (2020, May). Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6089-6093). IEEE.
Kourkounakis, T., Hajavi, A., & Etemad, A. (2021). Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2986-2999.
Lea, C., Mitra, V., Joshi, A., Kajarekar, S., & Bigham, J. P. (2021, June). Sep-28k: A dataset for stuttering event detection from podcasts with people who stutter. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6798-6802). IEEE.
Liao, J., Eskimez, S., Lu, L., Shi, Y., Gong, M., Shou, L., ... & Zeng, M. (2023). Improving readability for automatic speech recognition transcription. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(5), 1-23.
Manjutha, M., Subashini, P., Krishnaveni, M., & Narmadha, V. (2019, October). An optimized cepstral feature selection method for dysfluencies classification using Tamil speech dataset. In 2019 IEEE International Smart Cities Conference (ISC2) (pp. 671-677). IEEE.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mishra, N., Gupta, A., & Vathana, D. (2021). Optimization of stammering in speech recognition applications. International Journal of Speech Technology, 24(3), 679-685.
Mitchell, T. (1997). Machine learning.-New York, NY, USA: McGraw Hill. Inc. isbn, 70428077.
Mohapatra, P., Islam, B., Islam, M. T., Jiao, R., & Zhu, Q. (2023, June). Efficient stuttering event detection using siamese networks. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
Mohapatra, P., Pandey, A., Islam, B., & Zhu, Q. (2022, July). Speech disfluency detection with contextual representation and data distillation. In Proceedings of the 1st ACM international workshop on intelligent acoustic systems and applications (pp. 19-24).
Murugan, K., Cherukuri, N. K., & Donthu, S. S. (2022, June). Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique. In 2022 IEEE World Conference on Applied Intelligence and Computing (AIC) (pp. 774-781). IEEE.
Oliveira, B. S. N., do Rêgo, L. G. C., Peres, L., da Silva, T. L. C., & de Macêdo, J. A. F. (2022). Processamento de linguagem natural via aprendizagem profunda. Sociedade Brasileira de Computação.
Oliveira, C. M. C., Correia, D. V., & Di Ninno, C. Q. M. S. (2023). Avaliação da Fluência. In C. A. S. Azoni, J. O. de Lira, D. A. C. Lamônica, D. B. de Oliveira e Britto (Orgs.), Tratado de Linguagem: perspectivas contemporâneas. (2ª ed., pp. 109-117). Ribeirão Preto, SP: Book Toy.
Oliveira, L. P., Santos, J. H. D. S., de Almeida, E. L., Barbosa, J. R., da Silva, A. W., de Azevedo, L. P., & da Silva, M. V. (2021, April). Deep learning library performance analysis on raspberry (IoT device). In International Conference on Advanced Information Networking and Applications (pp. 383-392). Cham: Springer International Publishing.
Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic reviews, 5, 1-10.
Prabhu, Y., & Seliya, N. (2022, December). A CNN-based automated stuttering identification system. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1601-1605). IEEE.
Ratner, N. B., & MacWhinney, B. (2018). Fluency Bank: A new resource for fluency research and practice. Journal of fluency disorders, 56, 69-80.
Schneider, S., Baevski, A., Collobert, R., & Auli, M. (2019). wav2vec: Unsupervised pretraining for speech recognition. arXiv preprint arXiv:1904.05862.
Sharma, N. M., Kumar, V., Mahapatra, P. K., & Gandhi, V. (2023). Comparative analysis of various feature extraction techniques for classification of speech disfluencies. Speech Communication, 150, 23-31.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2021, August). Stutternet: Stuttering detection using time delay neural network. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 426-430). IEEE.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2022, August). Robust stuttering detection via multi-task and adversarial learning. In 2022 30th European Signal Processing Conference (EUSIPCO) (pp. 190-194). IEEE.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2023). Advancing stuttering detection via data augmentation, class-balanced loss and multi-contextual deep learning. IEEE Journal of Biomedical and Health Informatics.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2022). Machine learning for stuttering identification: Review, challenges and future directions. Neurocomputing, 514, 385-402.
Su, R., Liu, X., Wang, L., & Yang, J. (2019). Cross-domain deep visual feature generation for mandarin audio–visual speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 185-197.
Al-Banna, A. K., Edirisinghe, E., Fang, H., & Hadi, W. (2022). Stuttering disfluency detection using machine learning approaches. Journal of Information & Knowledge Management, 21(02), 2250020.
Alharbi, S., Hasan, M., Simons, A. J., Brumfitt, S., & Green, P. (2020). Sequence labeling to detect stuttering events in read speech. Computer Speech & Language, 62, 101052.
Almeida, R. J. S., Fernandes, D. Y. S., Oliveira, L. P., & Correia, D. V. (2023). Desafios e oportunidades na integração do ambiente clínico e digital para apoio ao diagnóstico da gagueira. Computação Brasil, (51), 37-41.
Ambrose, N. G., & Yairi, E. (1999). Normative disfluency data for early childhood stuttering. Journal of Speech, Language, and Hearing Research, 42(4), 895-909.
American Psychiatric Association. (2022). Childhood-Onset Fluency Disorder (Stuttering). In Diagnostic and statistical manual of mental disorders (5th ed.).
Andrade, C. D., Befi-Lopes, D. M., Fernandes, F. D. M., & Wertzner, H. F. (2004). ABFW: teste de linguagem infantil nas áreas de fonologia, vocabulário, fluência e pragmática. São Paulo: Pró-Fono.
Arbajian, P., Hajja, A., Raś, Z. W., & Wieczorkowska, A. A. (2019). Effect of speech segment samples selection in stutter block detection and remediation. Journal of Intelligent Information Systems, 53, 241-264.
Asci, F., Marsili, L., Suppa, A., Saggio, G., Michetti, E., Di Leo, P., & Costantini, G. (2023).
Acoustic analysis in stuttering: a machine-learning study. Frontiers in Neurology, 14, 1169707.
Barrett, L., Hu, J., & Howell, P. (2022). Systematic review of machine learning approaches for detecting developmental stuttering. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1160-1172.
Bayerl, S. P., Wagner, D., Nöth, E., & Riedhammer, K. (2022). Detecting dysfluencies in stuttering therapy using wav2vec 2.0. arXiv preprint arXiv:2204.03417.
Bayerl, S. P., Gerczuk, M., Batliner, A., Bergler, C., Amiriparian, S., Schuller, B., ... & Riedhammer, K. (2023). Classification of stuttering–The ComParE challenge and beyond. Computer Speech & Language, 81, 101519.
Bloodstein, O., Ratner, N. B., & Brundage, S. B. (2021). A handbook on stuttering. Plural Publishing.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215.
Chollet, F. (2021). Deep learning with Python. Simon and Schuster.
Deepak, G., Surya, D., Trivedi, I., Kumar, A., & Lingampalli, A. (2022). An artificially intelligent approach for automatic speech processing based on triune ontology and adaptive tribonacci deep neural networks. Computers & Electrical Engineering, 98, 107736.
Deng, J., Xie, X., Wang, T., Cui, M., Xue, B., Jin, Z., ... & Meng, H. (2022). Confidence score based conformer speaker adaptation for speech recognition. arXiv preprint arXiv:2206.12045.
Filipowicz, P., & Kostek, B. (2023). Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set. Applied Sciences, 13(10), 6192.
Fox, C. B., Israelsen-Augenstein, M., Jones, S., & Gillam, S. L. (2021). An evaluation of expedited transcription methods for school-age children's narrative language: automatic speech recognition and real-time transcription. Journal of Speech, Language, and Hearing Research, 64(9), 3533-3548.
Gupta, S., Shukla, R. S., Shukla, R. K., & Verma, R. (2020). Deep learning bidirectional LSTM based detection of prolongation and repetition in stuttered speech using weighted MFCC. International Journal of Advanced Computer Science and Applications, 11(9).
Howell, P., & Sackin, S. (1995, August). Automatic recognition of repetitions and prolongations in stuttered speech. In Proceedings of the first World Congress on fluency disorders (Vol. 2, pp. 372-374). Nijmegen, The Netherlands: University Press Nijmegen.
Howell, Peter & Davis, Stephen & Bartrip, Jon. (2009). The University College London Archive of Stuttered Speech (UCLASS). Journal of speech, language, and hearing research: JSLHR. 52. 556-69. 10.1044/1092-4388(07-0129).
Jegan, R., & Jayagowri, R. (2022). MFCC and texture descriptors based stuttering dysfluencies classification using extreme learning machine. International Journal of Advanced Computer Science and Applications, 13(8).
Jouaiti, M., & Dautenhahn, K. (2022, May). Dysfluency classification in stuttered speech using deep learning for real-time applications. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6482-6486). IEEE.
Kitchenham, B., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001, Keele University and Durham University Joint Report.
Kourkounakis, T., Hajavi, A., & Etemad, A. (2020, May). Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6089-6093). IEEE.
Kourkounakis, T., Hajavi, A., & Etemad, A. (2021). Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2986-2999.
Lea, C., Mitra, V., Joshi, A., Kajarekar, S., & Bigham, J. P. (2021, June). Sep-28k: A dataset for stuttering event detection from podcasts with people who stutter. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6798-6802). IEEE.
Liao, J., Eskimez, S., Lu, L., Shi, Y., Gong, M., Shou, L., ... & Zeng, M. (2023). Improving readability for automatic speech recognition transcription. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(5), 1-23.
Manjutha, M., Subashini, P., Krishnaveni, M., & Narmadha, V. (2019, October). An optimized cepstral feature selection method for dysfluencies classification using Tamil speech dataset. In 2019 IEEE International Smart Cities Conference (ISC2) (pp. 671-677). IEEE.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mishra, N., Gupta, A., & Vathana, D. (2021). Optimization of stammering in speech recognition applications. International Journal of Speech Technology, 24(3), 679-685.
Mitchell, T. (1997). Machine learning.-New York, NY, USA: McGraw Hill. Inc. isbn, 70428077.
Mohapatra, P., Islam, B., Islam, M. T., Jiao, R., & Zhu, Q. (2023, June). Efficient stuttering event detection using siamese networks. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
Mohapatra, P., Pandey, A., Islam, B., & Zhu, Q. (2022, July). Speech disfluency detection with contextual representation and data distillation. In Proceedings of the 1st ACM international workshop on intelligent acoustic systems and applications (pp. 19-24).
Murugan, K., Cherukuri, N. K., & Donthu, S. S. (2022, June). Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique. In 2022 IEEE World Conference on Applied Intelligence and Computing (AIC) (pp. 774-781). IEEE.
Oliveira, B. S. N., do Rêgo, L. G. C., Peres, L., da Silva, T. L. C., & de Macêdo, J. A. F. (2022). Processamento de linguagem natural via aprendizagem profunda. Sociedade Brasileira de Computação.
Oliveira, C. M. C., Correia, D. V., & Di Ninno, C. Q. M. S. (2023). Avaliação da Fluência. In C. A. S. Azoni, J. O. de Lira, D. A. C. Lamônica, D. B. de Oliveira e Britto (Orgs.), Tratado de Linguagem: perspectivas contemporâneas. (2ª ed., pp. 109-117). Ribeirão Preto, SP: Book Toy.
Oliveira, L. P., Santos, J. H. D. S., de Almeida, E. L., Barbosa, J. R., da Silva, A. W., de Azevedo, L. P., & da Silva, M. V. (2021, April). Deep learning library performance analysis on raspberry (IoT device). In International Conference on Advanced Information Networking and Applications (pp. 383-392). Cham: Springer International Publishing.
Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic reviews, 5, 1-10.
Prabhu, Y., & Seliya, N. (2022, December). A CNN-based automated stuttering identification system. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1601-1605). IEEE.
Ratner, N. B., & MacWhinney, B. (2018). Fluency Bank: A new resource for fluency research and practice. Journal of fluency disorders, 56, 69-80.
Schneider, S., Baevski, A., Collobert, R., & Auli, M. (2019). wav2vec: Unsupervised pretraining for speech recognition. arXiv preprint arXiv:1904.05862.
Sharma, N. M., Kumar, V., Mahapatra, P. K., & Gandhi, V. (2023). Comparative analysis of various feature extraction techniques for classification of speech disfluencies. Speech Communication, 150, 23-31.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2021, August). Stutternet: Stuttering detection using time delay neural network. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 426-430). IEEE.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2022, August). Robust stuttering detection via multi-task and adversarial learning. In 2022 30th European Signal Processing Conference (EUSIPCO) (pp. 190-194). IEEE.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2023). Advancing stuttering detection via data augmentation, class-balanced loss and multi-contextual deep learning. IEEE Journal of Biomedical and Health Informatics.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2022). Machine learning for stuttering identification: Review, challenges and future directions. Neurocomputing, 514, 385-402.
Su, R., Liu, X., Wang, L., & Yang, J. (2019). Cross-domain deep visual feature generation for mandarin audio–visual speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 185-197.
Publicado
25/06/2024
Como Citar
ALMEIDA, Rodrigo José S. de; SOUZA, Damires Yluska; OLIVEIRA, Luciana Pereira; CORREIA, Débora Vasconcelos; PINHEIRO, Samara Ruth Neves B.; SOUSA, Estevão S. da Silva.
Aprendizado de máquina no apoio à transcrição e classificação da fala gaguejada: uma revisão sistemática da literatura. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 24. , 2024, Goiânia/GO.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 400-411.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2024.2319.