Machine learning support for the transcription and classification of stuttered speech: a systematic literature review
Abstract
In the healthcare domain, stuttering identification is manually performed by speech therapists for diagnostic purposes. In this scenario, Machine Learning (ML) can be a valuable tool to support this activity, for example, by automating the transcription of stuttered speech and the classification of disfluencies. This work presents a systematic literature review aiming to investigate how studies have provided or utilized ML methods for transcription and classification of stuttered speech. It also seeks to identify to what extent these studies are applied to effectively support the clinical practice of speech therapists. This work also includes a survey of datasets, languages, diagnostic criteria, and challenges faced in stuttering identification.References
Adepu, Y., Boga, V. R., & Sairam, U. (2020, November). Interviewee performance analyzer using facial emotion recognition and speech fluency recognition. In 2020 IEEE International Conference for Innovation in Technology (INOCON) (pp. 1-5). IEEE.
Al-Banna, A. K., Edirisinghe, E., Fang, H., & Hadi, W. (2022). Stuttering disfluency detection using machine learning approaches. Journal of Information & Knowledge Management, 21(02), 2250020.
Alharbi, S., Hasan, M., Simons, A. J., Brumfitt, S., & Green, P. (2020). Sequence labeling to detect stuttering events in read speech. Computer Speech & Language, 62, 101052.
Almeida, R. J. S., Fernandes, D. Y. S., Oliveira, L. P., & Correia, D. V. (2023). Desafios e oportunidades na integração do ambiente clínico e digital para apoio ao diagnóstico da gagueira. Computação Brasil, (51), 37-41.
Ambrose, N. G., & Yairi, E. (1999). Normative disfluency data for early childhood stuttering. Journal of Speech, Language, and Hearing Research, 42(4), 895-909.
American Psychiatric Association. (2022). Childhood-Onset Fluency Disorder (Stuttering). In Diagnostic and statistical manual of mental disorders (5th ed.).
Andrade, C. D., Befi-Lopes, D. M., Fernandes, F. D. M., & Wertzner, H. F. (2004). ABFW: teste de linguagem infantil nas áreas de fonologia, vocabulário, fluência e pragmática. São Paulo: Pró-Fono.
Arbajian, P., Hajja, A., Raś, Z. W., & Wieczorkowska, A. A. (2019). Effect of speech segment samples selection in stutter block detection and remediation. Journal of Intelligent Information Systems, 53, 241-264.
Asci, F., Marsili, L., Suppa, A., Saggio, G., Michetti, E., Di Leo, P., & Costantini, G. (2023).
Acoustic analysis in stuttering: a machine-learning study. Frontiers in Neurology, 14, 1169707.
Barrett, L., Hu, J., & Howell, P. (2022). Systematic review of machine learning approaches for detecting developmental stuttering. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1160-1172.
Bayerl, S. P., Wagner, D., Nöth, E., & Riedhammer, K. (2022). Detecting dysfluencies in stuttering therapy using wav2vec 2.0. arXiv preprint arXiv:2204.03417.
Bayerl, S. P., Gerczuk, M., Batliner, A., Bergler, C., Amiriparian, S., Schuller, B., ... & Riedhammer, K. (2023). Classification of stuttering–The ComParE challenge and beyond. Computer Speech & Language, 81, 101519.
Bloodstein, O., Ratner, N. B., & Brundage, S. B. (2021). A handbook on stuttering. Plural Publishing.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215.
Chollet, F. (2021). Deep learning with Python. Simon and Schuster.
Deepak, G., Surya, D., Trivedi, I., Kumar, A., & Lingampalli, A. (2022). An artificially intelligent approach for automatic speech processing based on triune ontology and adaptive tribonacci deep neural networks. Computers & Electrical Engineering, 98, 107736.
Deng, J., Xie, X., Wang, T., Cui, M., Xue, B., Jin, Z., ... & Meng, H. (2022). Confidence score based conformer speaker adaptation for speech recognition. arXiv preprint arXiv:2206.12045.
Filipowicz, P., & Kostek, B. (2023). Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set. Applied Sciences, 13(10), 6192.
Fox, C. B., Israelsen-Augenstein, M., Jones, S., & Gillam, S. L. (2021). An evaluation of expedited transcription methods for school-age children's narrative language: automatic speech recognition and real-time transcription. Journal of Speech, Language, and Hearing Research, 64(9), 3533-3548.
Gupta, S., Shukla, R. S., Shukla, R. K., & Verma, R. (2020). Deep learning bidirectional LSTM based detection of prolongation and repetition in stuttered speech using weighted MFCC. International Journal of Advanced Computer Science and Applications, 11(9).
Howell, P., & Sackin, S. (1995, August). Automatic recognition of repetitions and prolongations in stuttered speech. In Proceedings of the first World Congress on fluency disorders (Vol. 2, pp. 372-374). Nijmegen, The Netherlands: University Press Nijmegen.
Howell, Peter & Davis, Stephen & Bartrip, Jon. (2009). The University College London Archive of Stuttered Speech (UCLASS). Journal of speech, language, and hearing research: JSLHR. 52. 556-69. 10.1044/1092-4388(07-0129).
Jegan, R., & Jayagowri, R. (2022). MFCC and texture descriptors based stuttering dysfluencies classification using extreme learning machine. International Journal of Advanced Computer Science and Applications, 13(8).
Jouaiti, M., & Dautenhahn, K. (2022, May). Dysfluency classification in stuttered speech using deep learning for real-time applications. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6482-6486). IEEE.
Kitchenham, B., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001, Keele University and Durham University Joint Report.
Kourkounakis, T., Hajavi, A., & Etemad, A. (2020, May). Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6089-6093). IEEE.
Kourkounakis, T., Hajavi, A., & Etemad, A. (2021). Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2986-2999.
Lea, C., Mitra, V., Joshi, A., Kajarekar, S., & Bigham, J. P. (2021, June). Sep-28k: A dataset for stuttering event detection from podcasts with people who stutter. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6798-6802). IEEE.
Liao, J., Eskimez, S., Lu, L., Shi, Y., Gong, M., Shou, L., ... & Zeng, M. (2023). Improving readability for automatic speech recognition transcription. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(5), 1-23.
Manjutha, M., Subashini, P., Krishnaveni, M., & Narmadha, V. (2019, October). An optimized cepstral feature selection method for dysfluencies classification using Tamil speech dataset. In 2019 IEEE International Smart Cities Conference (ISC2) (pp. 671-677). IEEE.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mishra, N., Gupta, A., & Vathana, D. (2021). Optimization of stammering in speech recognition applications. International Journal of Speech Technology, 24(3), 679-685.
Mitchell, T. (1997). Machine learning.-New York, NY, USA: McGraw Hill. Inc. isbn, 70428077.
Mohapatra, P., Islam, B., Islam, M. T., Jiao, R., & Zhu, Q. (2023, June). Efficient stuttering event detection using siamese networks. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
Mohapatra, P., Pandey, A., Islam, B., & Zhu, Q. (2022, July). Speech disfluency detection with contextual representation and data distillation. In Proceedings of the 1st ACM international workshop on intelligent acoustic systems and applications (pp. 19-24).
Murugan, K., Cherukuri, N. K., & Donthu, S. S. (2022, June). Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique. In 2022 IEEE World Conference on Applied Intelligence and Computing (AIC) (pp. 774-781). IEEE.
Oliveira, B. S. N., do Rêgo, L. G. C., Peres, L., da Silva, T. L. C., & de Macêdo, J. A. F. (2022). Processamento de linguagem natural via aprendizagem profunda. Sociedade Brasileira de Computação.
Oliveira, C. M. C., Correia, D. V., & Di Ninno, C. Q. M. S. (2023). Avaliação da Fluência. In C. A. S. Azoni, J. O. de Lira, D. A. C. Lamônica, D. B. de Oliveira e Britto (Orgs.), Tratado de Linguagem: perspectivas contemporâneas. (2ª ed., pp. 109-117). Ribeirão Preto, SP: Book Toy.
Oliveira, L. P., Santos, J. H. D. S., de Almeida, E. L., Barbosa, J. R., da Silva, A. W., de Azevedo, L. P., & da Silva, M. V. (2021, April). Deep learning library performance analysis on raspberry (IoT device). In International Conference on Advanced Information Networking and Applications (pp. 383-392). Cham: Springer International Publishing.
Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic reviews, 5, 1-10.
Prabhu, Y., & Seliya, N. (2022, December). A CNN-based automated stuttering identification system. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1601-1605). IEEE.
Ratner, N. B., & MacWhinney, B. (2018). Fluency Bank: A new resource for fluency research and practice. Journal of fluency disorders, 56, 69-80.
Schneider, S., Baevski, A., Collobert, R., & Auli, M. (2019). wav2vec: Unsupervised pretraining for speech recognition. arXiv preprint arXiv:1904.05862.
Sharma, N. M., Kumar, V., Mahapatra, P. K., & Gandhi, V. (2023). Comparative analysis of various feature extraction techniques for classification of speech disfluencies. Speech Communication, 150, 23-31.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2021, August). Stutternet: Stuttering detection using time delay neural network. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 426-430). IEEE.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2022, August). Robust stuttering detection via multi-task and adversarial learning. In 2022 30th European Signal Processing Conference (EUSIPCO) (pp. 190-194). IEEE.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2023). Advancing stuttering detection via data augmentation, class-balanced loss and multi-contextual deep learning. IEEE Journal of Biomedical and Health Informatics.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2022). Machine learning for stuttering identification: Review, challenges and future directions. Neurocomputing, 514, 385-402.
Su, R., Liu, X., Wang, L., & Yang, J. (2019). Cross-domain deep visual feature generation for mandarin audio–visual speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 185-197.
Al-Banna, A. K., Edirisinghe, E., Fang, H., & Hadi, W. (2022). Stuttering disfluency detection using machine learning approaches. Journal of Information & Knowledge Management, 21(02), 2250020.
Alharbi, S., Hasan, M., Simons, A. J., Brumfitt, S., & Green, P. (2020). Sequence labeling to detect stuttering events in read speech. Computer Speech & Language, 62, 101052.
Almeida, R. J. S., Fernandes, D. Y. S., Oliveira, L. P., & Correia, D. V. (2023). Desafios e oportunidades na integração do ambiente clínico e digital para apoio ao diagnóstico da gagueira. Computação Brasil, (51), 37-41.
Ambrose, N. G., & Yairi, E. (1999). Normative disfluency data for early childhood stuttering. Journal of Speech, Language, and Hearing Research, 42(4), 895-909.
American Psychiatric Association. (2022). Childhood-Onset Fluency Disorder (Stuttering). In Diagnostic and statistical manual of mental disorders (5th ed.).
Andrade, C. D., Befi-Lopes, D. M., Fernandes, F. D. M., & Wertzner, H. F. (2004). ABFW: teste de linguagem infantil nas áreas de fonologia, vocabulário, fluência e pragmática. São Paulo: Pró-Fono.
Arbajian, P., Hajja, A., Raś, Z. W., & Wieczorkowska, A. A. (2019). Effect of speech segment samples selection in stutter block detection and remediation. Journal of Intelligent Information Systems, 53, 241-264.
Asci, F., Marsili, L., Suppa, A., Saggio, G., Michetti, E., Di Leo, P., & Costantini, G. (2023).
Acoustic analysis in stuttering: a machine-learning study. Frontiers in Neurology, 14, 1169707.
Barrett, L., Hu, J., & Howell, P. (2022). Systematic review of machine learning approaches for detecting developmental stuttering. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1160-1172.
Bayerl, S. P., Wagner, D., Nöth, E., & Riedhammer, K. (2022). Detecting dysfluencies in stuttering therapy using wav2vec 2.0. arXiv preprint arXiv:2204.03417.
Bayerl, S. P., Gerczuk, M., Batliner, A., Bergler, C., Amiriparian, S., Schuller, B., ... & Riedhammer, K. (2023). Classification of stuttering–The ComParE challenge and beyond. Computer Speech & Language, 81, 101519.
Bloodstein, O., Ratner, N. B., & Brundage, S. B. (2021). A handbook on stuttering. Plural Publishing.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215.
Chollet, F. (2021). Deep learning with Python. Simon and Schuster.
Deepak, G., Surya, D., Trivedi, I., Kumar, A., & Lingampalli, A. (2022). An artificially intelligent approach for automatic speech processing based on triune ontology and adaptive tribonacci deep neural networks. Computers & Electrical Engineering, 98, 107736.
Deng, J., Xie, X., Wang, T., Cui, M., Xue, B., Jin, Z., ... & Meng, H. (2022). Confidence score based conformer speaker adaptation for speech recognition. arXiv preprint arXiv:2206.12045.
Filipowicz, P., & Kostek, B. (2023). Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set. Applied Sciences, 13(10), 6192.
Fox, C. B., Israelsen-Augenstein, M., Jones, S., & Gillam, S. L. (2021). An evaluation of expedited transcription methods for school-age children's narrative language: automatic speech recognition and real-time transcription. Journal of Speech, Language, and Hearing Research, 64(9), 3533-3548.
Gupta, S., Shukla, R. S., Shukla, R. K., & Verma, R. (2020). Deep learning bidirectional LSTM based detection of prolongation and repetition in stuttered speech using weighted MFCC. International Journal of Advanced Computer Science and Applications, 11(9).
Howell, P., & Sackin, S. (1995, August). Automatic recognition of repetitions and prolongations in stuttered speech. In Proceedings of the first World Congress on fluency disorders (Vol. 2, pp. 372-374). Nijmegen, The Netherlands: University Press Nijmegen.
Howell, Peter & Davis, Stephen & Bartrip, Jon. (2009). The University College London Archive of Stuttered Speech (UCLASS). Journal of speech, language, and hearing research: JSLHR. 52. 556-69. 10.1044/1092-4388(07-0129).
Jegan, R., & Jayagowri, R. (2022). MFCC and texture descriptors based stuttering dysfluencies classification using extreme learning machine. International Journal of Advanced Computer Science and Applications, 13(8).
Jouaiti, M., & Dautenhahn, K. (2022, May). Dysfluency classification in stuttered speech using deep learning for real-time applications. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6482-6486). IEEE.
Kitchenham, B., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001, Keele University and Durham University Joint Report.
Kourkounakis, T., Hajavi, A., & Etemad, A. (2020, May). Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6089-6093). IEEE.
Kourkounakis, T., Hajavi, A., & Etemad, A. (2021). Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2986-2999.
Lea, C., Mitra, V., Joshi, A., Kajarekar, S., & Bigham, J. P. (2021, June). Sep-28k: A dataset for stuttering event detection from podcasts with people who stutter. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6798-6802). IEEE.
Liao, J., Eskimez, S., Lu, L., Shi, Y., Gong, M., Shou, L., ... & Zeng, M. (2023). Improving readability for automatic speech recognition transcription. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(5), 1-23.
Manjutha, M., Subashini, P., Krishnaveni, M., & Narmadha, V. (2019, October). An optimized cepstral feature selection method for dysfluencies classification using Tamil speech dataset. In 2019 IEEE International Smart Cities Conference (ISC2) (pp. 671-677). IEEE.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mishra, N., Gupta, A., & Vathana, D. (2021). Optimization of stammering in speech recognition applications. International Journal of Speech Technology, 24(3), 679-685.
Mitchell, T. (1997). Machine learning.-New York, NY, USA: McGraw Hill. Inc. isbn, 70428077.
Mohapatra, P., Islam, B., Islam, M. T., Jiao, R., & Zhu, Q. (2023, June). Efficient stuttering event detection using siamese networks. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
Mohapatra, P., Pandey, A., Islam, B., & Zhu, Q. (2022, July). Speech disfluency detection with contextual representation and data distillation. In Proceedings of the 1st ACM international workshop on intelligent acoustic systems and applications (pp. 19-24).
Murugan, K., Cherukuri, N. K., & Donthu, S. S. (2022, June). Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique. In 2022 IEEE World Conference on Applied Intelligence and Computing (AIC) (pp. 774-781). IEEE.
Oliveira, B. S. N., do Rêgo, L. G. C., Peres, L., da Silva, T. L. C., & de Macêdo, J. A. F. (2022). Processamento de linguagem natural via aprendizagem profunda. Sociedade Brasileira de Computação.
Oliveira, C. M. C., Correia, D. V., & Di Ninno, C. Q. M. S. (2023). Avaliação da Fluência. In C. A. S. Azoni, J. O. de Lira, D. A. C. Lamônica, D. B. de Oliveira e Britto (Orgs.), Tratado de Linguagem: perspectivas contemporâneas. (2ª ed., pp. 109-117). Ribeirão Preto, SP: Book Toy.
Oliveira, L. P., Santos, J. H. D. S., de Almeida, E. L., Barbosa, J. R., da Silva, A. W., de Azevedo, L. P., & da Silva, M. V. (2021, April). Deep learning library performance analysis on raspberry (IoT device). In International Conference on Advanced Information Networking and Applications (pp. 383-392). Cham: Springer International Publishing.
Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic reviews, 5, 1-10.
Prabhu, Y., & Seliya, N. (2022, December). A CNN-based automated stuttering identification system. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1601-1605). IEEE.
Ratner, N. B., & MacWhinney, B. (2018). Fluency Bank: A new resource for fluency research and practice. Journal of fluency disorders, 56, 69-80.
Schneider, S., Baevski, A., Collobert, R., & Auli, M. (2019). wav2vec: Unsupervised pretraining for speech recognition. arXiv preprint arXiv:1904.05862.
Sharma, N. M., Kumar, V., Mahapatra, P. K., & Gandhi, V. (2023). Comparative analysis of various feature extraction techniques for classification of speech disfluencies. Speech Communication, 150, 23-31.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2021, August). Stutternet: Stuttering detection using time delay neural network. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 426-430). IEEE.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2022, August). Robust stuttering detection via multi-task and adversarial learning. In 2022 30th European Signal Processing Conference (EUSIPCO) (pp. 190-194). IEEE.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2023). Advancing stuttering detection via data augmentation, class-balanced loss and multi-contextual deep learning. IEEE Journal of Biomedical and Health Informatics.
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2022). Machine learning for stuttering identification: Review, challenges and future directions. Neurocomputing, 514, 385-402.
Su, R., Liu, X., Wang, L., & Yang, J. (2019). Cross-domain deep visual feature generation for mandarin audio–visual speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 185-197.
Published
2024-06-25
How to Cite
ALMEIDA, Rodrigo José S. de; SOUZA, Damires Yluska; OLIVEIRA, Luciana Pereira; CORREIA, Débora Vasconcelos; PINHEIRO, Samara Ruth Neves B.; SOUSA, Estevão S. da Silva.
Machine learning support for the transcription and classification of stuttered speech: a systematic literature review. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 24. , 2024, Goiânia/GO.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 400-411.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2024.2319.
