MedTalkAI: Assisted Anamnesis Creation With Automatic Speech Recognition

  • Yanna Torres Gonçalves Universidade Federal do Ceará (UFC)
  • João Victor B. Alves Universidade Federal do Ceará (UFC)
  • Breno Alef Dourado Sá Universidade Federal do Ceará (UFC)
  • Lázaro Natanael da Silva Universidade Federal do Ceará (UFC)
  • José A. Fernandes de Macedo Universidade Federal do Ceará (UFC)
  • Ticiana L. Coelho da Silva Universidade Federal do Ceará (UFC)

Resumo


Conventional approaches to documenting patient medical histories are often time-consuming and require significant healthcare professional involvement. This paper introduces MedTalkAI, which integrates ASR models, including Whisper and Wav2Vec 2.0, to transcribe audio recordings of patient histories in Brazilian Portuguese efficiently. MedTalkAI validates, corrects, and evaluates transcriptions, facilitating the creation of a unique medical audio-text database. Additionally, MedTalkAI enhances ASR models for medical applications using language models. This approach aims to improve medical history transcription and analysis, contributing to the development of more reliable ASR models and automating the documentation process.
Palavras-chave: Medical History, Automatic Speech Recognition, Language Model

Referências

Baevski, A., Zhou, Y., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In NeurIPS, pages 12449–12460.

Besacier, L., Barnard, E., Karpov, A., and Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56:85–100.

da Silva, T. L. C., Magalhães, R. P., de Macêdo, J. A., Araújo, D., Araújo, N., de Melo, V. T., Olímpio, P., Rego, P. A., and Neto, A. V. L. (2019). Improving named entity recognition using deep learning with human in the loop. In EDBT, pages 594–597.

Gür, B. (2012). Improving speech recognition accuracy for clinical conversations. PhD thesis, Massachusetts Institute of Technology.

Heafield, K. (2011). Kenlm: Faster and smaller language model queries. In Proceedings of the sixth workshop on statistical machine translation, pages 187–197.

Li, J., Lavrukhin, V., Ginsburg, B., Leary, R., Kuchaiev, O., Cohen, J. M., Nguyen, H., and Gadde, R. T. (2019). Jasper: An End-to-End Convolutional Neural Acoustic Model. In Proc. Interspeech 2019, pages 71–75. ISCA.

Li, Y., Yu, B., Quangang, L., and Liu, T. (2021). Fitannotator: A flexible and intelligent text annotation system. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, pages 35–41.

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In ICML, pages 28492–28518.

Rubenstein, P. K., Asawaroengchai, C., Nguyen, D. D., Bapna, A., Borsos, Z., Quitry, F. d. C., Chen, P., Badawy, D. E., Han, W., Kharitonov, E., et al. (2023). Audiopalm: A large language model that can speak and listen. arXiv preprint arXiv:2306.12925.

Schneider, S., Baevski, A., Collobert, R., and Auli, M. (2019). wav2vec: Unsupervised pre-training for speech recognition. In Interspeech 2019, pages 3465–3469.

Stolcke, A. (2002). Srilm-an extensible language modeling toolkit. In Seventh international conference on spoken language processing.

Sullivan, P., Shibano, T., and Abdul-Mageed, M. (2022). Improving automatic speech recognition for non-native english with transfer learning and language model decoding. In AANLSP, pages 21–44.
Publicado
14/10/2024
GONÇALVES, Yanna Torres; ALVES, João Victor B.; SÁ, Breno Alef Dourado; SILVA, Lázaro Natanael da; MACEDO, José A. Fernandes de; COELHO DA SILVA, Ticiana L.. MedTalkAI: Assisted Anamnesis Creation With Automatic Speech Recognition. In: DEMONSTRAÇÕES E APLICAÇÕES - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 83-88. DOI: https://doi.org/10.5753/sbbd_estendido.2024.243214.