AllianceScan: identificação de disfluências em português brasileiro em áudios transcritos de psicoterapia para predizer aliança terapêutica
Resumo
As disfluências na fala, como pausas e repetições, podem indicar tanto processos cognitivos quanto estados emocionais subjacentes. Este estudo propõe a AllianceScan, uma abordagem para identificar disfluências em transcrições de áudios em Português Brasileiro, aplicada em sessões de psicoterapia. Utilizando reconhecimento de voz, a abordagem classifica quatro tipos de disfluências e utiliza modelos de regressão para prever a aliança terapêutica. Os resultados, avaliados por métricas como Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Median Absolute Error (MDAE), indicam que a abordagem pode melhorar a compreensão das interações terapêuticas e identificar sinais de progresso no tratamento.Referências
Almeida, A. A. F. d., Behlau, M., and Leite, J. R. (2011). Correlação entre ansiedade e performance comunicativa. Revista da Sociedade Brasileira de Fonoaudiologia, 16:384–389.
Chen, Q., Chen, M., Li, B., and Wang, W. (2020). Controllable time-delay transformer for real-time punctuation prediction and disfluency detection. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8069–8073.
Clavel, C. (2016). Sentiment analysis: From opinion mining to human-agent interaction. IEEE Transactions on Affective Computing, 7.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
Flemotomos, N., Martinez, V. R., Chen, Z., Singla, K., Ardulov, V., Peri, R., Caperton, D. D., Gibson, J., Tanana, M. J., Georgiou, P., et al. (2022). Automated evaluation of psychotherapy skills using speech and language technologies. Behavior Research Methods, 54(2):690–711.
Frigerio-Domingues, C. and Drayna, D. (2017). Genetic contributions to stuttering: the current evidence. Molecular genetics & genomic medicine, 5(2):95–102.
Fukuti, P., Uchôa, C. L. M., Mazzoco, M. F., Cruz, I. D., Echegaray, M. V., Humes, E. d. C., Silveira, J. B., Santi, T. D., Miguel, E. C., and Corchs, F. (2021). Comvc-19: A program to protect healthcare workers’ mental health during the covid-19 pandemic. what we have learned. Clinics, 76.
Hatcher, R. L. and and, J. A. G. (2006). Development and validation of a revised short version of the working alliance inventory. Psychotherapy Research, 16(1):12–25.
Horii, K., Fukuda, M., Ohta, K., Nishimura, R., Ogawa, A., and Kitaoka, N. (2022). End-to-end spontaneous speech recognition using disfluency labeling. In Proc. Interspeech 2022, pages 4108–4112.
Horvath, A. O. and Greenberg, L. S. (1989). Development and validation of the working alliance inventory. Journal of counseling psychology, 36(2):223.
Khara, S., Singh, S., and Vir, D. (2018). A comparative study of the techniques for feature extraction and classification in stuttering. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pages 887–893.
Kumar, Y. and Singh, N. (2019). A comprehensive view of automatic speech recognition system - a systematic literature review. In 2019 International Conference on Automation, Computational and Technology Management (ICACTM), pages 168–173.
Lee, D., Ko, B., Shin, M. C., Whang, T., Lee, D., Kim, E., Kim, E., and Jo, J. (2021). Auxiliary sequence labeling tasks for disfluency detection. In Proc. Interspeech 2021, pages 4229–4233.
Lee, K.-F., Hon, H.-W., and Reddy, R. (1990). An overview of the sphinx speech recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(1):35–45.
Luna, A. S., Machado-Lima, A., and Nunes, F. L. S. (2022). Analysis of transcription tools for brazilian portuguese with focus on disfluency detection. In Proceedings of the 21st Brazilian Symposium on Human Factors in Computing Systems, pages 1–10.
Luna, A. S., Machado-Lima, A., and Nunes, F. L. S. (2025). Identification and classification of speech disfluencies: A systematic review on methods, databases, tools, evaluation and challenges. Journal of the Brazilian Computer Society, 31(1):154–173.
Maguire, G. A., Nguyen, D. L., Simonson, K. C., and Kurz, T. L. (2020). The pharmacologic treatment of stuttering and its neuropharmacologic basis. Frontiers in neuroscience, 14:158.
Maia, R. d. S., Araújo, T. C. S. d., Silva, N. G. d., and Maia, E. M. C. (2017). Instrumentos para avaliação da aliança terapêutica. Revista Brasileira de Terapias Cognitivas, 13:55 – 63.
Ming, F. J., Shabana Anhum, S., Islam, S., and Keoy, K. H. (2023). Facial emotion recognition system for mental stress detection among university students. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), pages 1–6.
Negri, A., Christian, C., Mariani, R., Belotti, L., Andreoli, G., and Danskin, K. (2019). Linguistic features of the therapeutic alliance in the first session: a psychotherapy process study. Research in Psychotherapy: Psychopathology, Process, and Outcome, 22(1).
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., and Vesely, K. (2011). The kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society. IEEE Catalog No.: CFP11SRW-USB.
Prado, O. Z. and Meyer, S. B. (2006). Avaliação da relação terapêutica na terapia assíncrona via internet. Psicologia em estudo, 11:247–257.
Rocholl, J. C., Zayats, V., Walker, D. D., Murad, N. B., Schneider, A., and Liebling, D. J. (2021). Disfluency detection with unlabeled data and small bert models. In Proc. Interspeech 2021, pages 766–770.
Ryu, J., Banthin, D. C., and Gu, X. (2021). Modeling therapeutic alliance in the age of telepsychiatry. Trends in Cognitive Sciences, 25(1):5–8.
Sakurai, M. and Kosaka, T. (2021). Emotion recognition combining acoustic and linguistic features based on speech recognition results. In 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), pages 824–827.
Serralta, F. B., Benetti, S., Laskoski, P. B., and Abs, D. (2020). The brazilian-adapted working alliance inventory: preliminary report on the psychometric properties of the original and short revised versions. Trends in Psychiatry and Psychotherapy, 42:256 – 261.
Shriberg, E. E. (1994). Preliminaries to a theory of speech disfluencies. PhD thesis, University of California, Berkeley.
Swamy, T. J., Nandini, M., B, N., Karthika K, V., Anvitha, V. L., and Sunitha, C. (2022). Voice and gesture based virtual desktop assistant for physically challenged people. In 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI), pages 222–226.
Treviso, M. V. and Aluísio, S. M. (2018). Sentence segmentation and disfluency detection in narrative transcripts from neuropsychological tests. In Computational Processing of the Portuguese Language (PROPOR), pages 409–418. Springer International Publishing.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Wang, S., Che, W., and Liu, T. (2016). A neural attention model for disfluency detection. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 278–287, Osaka, Japan. The COLING 2016 Organizing Committee.
Wang, S., Che, W., Zhang, Y., Zhang, M., and Liu, T. (2017). Transition-based disfluency detection using lstms. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2785–2794, Copenhagen, Denmark. Association for Computational Linguistics.
Wu, S., Zhang, D., Zhou, M., and Zhao, T. (2015). Efficient disfluency detection with transition-based parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 495–503, Beijing, China. Association for Computational Linguistics.
Yoshikawa, M., Shindo, H., and Matsumoto, Y. (2016). Joint transition-based dependency parsing and disfluency detection for automatic speech recognition texts. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1036–1041, Austin, Texas. Association for Computational Linguistics.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al. (2002). The htk book. Cambridge university engineering department, 3(175):12.
Zayats, V. and Ostendorf, M. (2019). Giving attention to the unexpected: Using prosody innovations in disfluency detection. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 86–95, Minneapolis, Minnesota. Association for Computational Linguistics.
Zilcha-Mano, S. (2017). Is the alliance really therapeutic? revisiting this question in light of recent methodological advances. American Psychologist, 72:311–325.
Chen, Q., Chen, M., Li, B., and Wang, W. (2020). Controllable time-delay transformer for real-time punctuation prediction and disfluency detection. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8069–8073.
Clavel, C. (2016). Sentiment analysis: From opinion mining to human-agent interaction. IEEE Transactions on Affective Computing, 7.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
Flemotomos, N., Martinez, V. R., Chen, Z., Singla, K., Ardulov, V., Peri, R., Caperton, D. D., Gibson, J., Tanana, M. J., Georgiou, P., et al. (2022). Automated evaluation of psychotherapy skills using speech and language technologies. Behavior Research Methods, 54(2):690–711.
Frigerio-Domingues, C. and Drayna, D. (2017). Genetic contributions to stuttering: the current evidence. Molecular genetics & genomic medicine, 5(2):95–102.
Fukuti, P., Uchôa, C. L. M., Mazzoco, M. F., Cruz, I. D., Echegaray, M. V., Humes, E. d. C., Silveira, J. B., Santi, T. D., Miguel, E. C., and Corchs, F. (2021). Comvc-19: A program to protect healthcare workers’ mental health during the covid-19 pandemic. what we have learned. Clinics, 76.
Hatcher, R. L. and and, J. A. G. (2006). Development and validation of a revised short version of the working alliance inventory. Psychotherapy Research, 16(1):12–25.
Horii, K., Fukuda, M., Ohta, K., Nishimura, R., Ogawa, A., and Kitaoka, N. (2022). End-to-end spontaneous speech recognition using disfluency labeling. In Proc. Interspeech 2022, pages 4108–4112.
Horvath, A. O. and Greenberg, L. S. (1989). Development and validation of the working alliance inventory. Journal of counseling psychology, 36(2):223.
Khara, S., Singh, S., and Vir, D. (2018). A comparative study of the techniques for feature extraction and classification in stuttering. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pages 887–893.
Kumar, Y. and Singh, N. (2019). A comprehensive view of automatic speech recognition system - a systematic literature review. In 2019 International Conference on Automation, Computational and Technology Management (ICACTM), pages 168–173.
Lee, D., Ko, B., Shin, M. C., Whang, T., Lee, D., Kim, E., Kim, E., and Jo, J. (2021). Auxiliary sequence labeling tasks for disfluency detection. In Proc. Interspeech 2021, pages 4229–4233.
Lee, K.-F., Hon, H.-W., and Reddy, R. (1990). An overview of the sphinx speech recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(1):35–45.
Luna, A. S., Machado-Lima, A., and Nunes, F. L. S. (2022). Analysis of transcription tools for brazilian portuguese with focus on disfluency detection. In Proceedings of the 21st Brazilian Symposium on Human Factors in Computing Systems, pages 1–10.
Luna, A. S., Machado-Lima, A., and Nunes, F. L. S. (2025). Identification and classification of speech disfluencies: A systematic review on methods, databases, tools, evaluation and challenges. Journal of the Brazilian Computer Society, 31(1):154–173.
Maguire, G. A., Nguyen, D. L., Simonson, K. C., and Kurz, T. L. (2020). The pharmacologic treatment of stuttering and its neuropharmacologic basis. Frontiers in neuroscience, 14:158.
Maia, R. d. S., Araújo, T. C. S. d., Silva, N. G. d., and Maia, E. M. C. (2017). Instrumentos para avaliação da aliança terapêutica. Revista Brasileira de Terapias Cognitivas, 13:55 – 63.
Ming, F. J., Shabana Anhum, S., Islam, S., and Keoy, K. H. (2023). Facial emotion recognition system for mental stress detection among university students. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), pages 1–6.
Negri, A., Christian, C., Mariani, R., Belotti, L., Andreoli, G., and Danskin, K. (2019). Linguistic features of the therapeutic alliance in the first session: a psychotherapy process study. Research in Psychotherapy: Psychopathology, Process, and Outcome, 22(1).
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., and Vesely, K. (2011). The kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society. IEEE Catalog No.: CFP11SRW-USB.
Prado, O. Z. and Meyer, S. B. (2006). Avaliação da relação terapêutica na terapia assíncrona via internet. Psicologia em estudo, 11:247–257.
Rocholl, J. C., Zayats, V., Walker, D. D., Murad, N. B., Schneider, A., and Liebling, D. J. (2021). Disfluency detection with unlabeled data and small bert models. In Proc. Interspeech 2021, pages 766–770.
Ryu, J., Banthin, D. C., and Gu, X. (2021). Modeling therapeutic alliance in the age of telepsychiatry. Trends in Cognitive Sciences, 25(1):5–8.
Sakurai, M. and Kosaka, T. (2021). Emotion recognition combining acoustic and linguistic features based on speech recognition results. In 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), pages 824–827.
Serralta, F. B., Benetti, S., Laskoski, P. B., and Abs, D. (2020). The brazilian-adapted working alliance inventory: preliminary report on the psychometric properties of the original and short revised versions. Trends in Psychiatry and Psychotherapy, 42:256 – 261.
Shriberg, E. E. (1994). Preliminaries to a theory of speech disfluencies. PhD thesis, University of California, Berkeley.
Swamy, T. J., Nandini, M., B, N., Karthika K, V., Anvitha, V. L., and Sunitha, C. (2022). Voice and gesture based virtual desktop assistant for physically challenged people. In 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI), pages 222–226.
Treviso, M. V. and Aluísio, S. M. (2018). Sentence segmentation and disfluency detection in narrative transcripts from neuropsychological tests. In Computational Processing of the Portuguese Language (PROPOR), pages 409–418. Springer International Publishing.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Wang, S., Che, W., and Liu, T. (2016). A neural attention model for disfluency detection. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 278–287, Osaka, Japan. The COLING 2016 Organizing Committee.
Wang, S., Che, W., Zhang, Y., Zhang, M., and Liu, T. (2017). Transition-based disfluency detection using lstms. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2785–2794, Copenhagen, Denmark. Association for Computational Linguistics.
Wu, S., Zhang, D., Zhou, M., and Zhao, T. (2015). Efficient disfluency detection with transition-based parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 495–503, Beijing, China. Association for Computational Linguistics.
Yoshikawa, M., Shindo, H., and Matsumoto, Y. (2016). Joint transition-based dependency parsing and disfluency detection for automatic speech recognition texts. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1036–1041, Austin, Texas. Association for Computational Linguistics.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al. (2002). The htk book. Cambridge university engineering department, 3(175):12.
Zayats, V. and Ostendorf, M. (2019). Giving attention to the unexpected: Using prosody innovations in disfluency detection. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 86–95, Minneapolis, Minnesota. Association for Computational Linguistics.
Zilcha-Mano, S. (2017). Is the alliance really therapeutic? revisiting this question in light of recent methodological advances. American Psychologist, 72:311–325.
Publicado
09/06/2025
Como Citar
LUNA, Alana S.; MACHADO-LIMA, Ariane; NUNES, Fátima L. S..
AllianceScan: identificação de disfluências em português brasileiro em áudios transcritos de psicoterapia para predizer aliança terapêutica. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 25. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 437-448.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2025.7260.