Multimodal Summarization of Clinical Dialogues in Digital Primary Care: Integrating Text Messages and Audio

  • Davi Reis UFMG
  • Anderson A. Ferreira UFOP
  • Washington Cunha UNICAMP
  • Victor Macul Ana Health
  • Olivio Neto Ana Health
  • Jussara Almeida UFMG
  • Leonardo Rocha UFSJ
  • Marcos André Gonçalves UFMG

Abstract


Instant messaging platforms in digital health have increased the volume of interactions, making the management and retrieval of clinical information a central challenge in digital primary care. Although automatic summarization of text-based dialogues with Large Language Models (LLMs) has been explored, a substantial portion of these exchanges occurs through audio messages. In this work, we propose a multimodal pipeline that integrates speech and text for LLM-based dialogue summarization. It was investigated (i) how to automatically extract clinically relevant information from audio messages with varying quality and (ii) the impact of this integration on summary quality. The methodology was developed using 706 real-world audio messages, a manually annotated dataset, and classifiers to filter out inadequate transcriptions. Results show that incorporating audio messages enriches the summaries by increasing contextualization and the level of clinical detail.

References

Anibal, J., Huth, Wood, B., et al. (2025). Voice EHR: introducing multimodal audio data for health. Frontiers in Digital Health.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. J. Artif. Int. Res.

Esquivel, P., Gill, K., Goldberg, M., Sundaram, S. A., Morris, L., and Ding, D. (2024). Voice assistant utilization among the disability community for independent living: A rapid review of recent evidence. Human Behavior and Emerging Technologies.

Ferreira, A. A., Rocha, L., et al. (2025). A comprehensive qualitative analysis of patient dialogue summarization using large language models applied to noisy, informal, non-english real-world data. Scientific Reports.

Hone, T., Rasella, D., Barreto, M. L., Majeed, A., and Millett, C. (2017). Association between expansion of primary healthcare and racial inequalities in mortality amenable to primary care in brazil: a national longitudinal analysis. PLoS medicine.

Keszthelyi, D., Gaudet-Blavignac, C., Bjelogrlic, M., and Lovis, C. (2023). Patient information summarization in clinical settings: Scoping review. JMIR Medical Informatics.

Liu, S., McCoy, A. B., Wright, A., et al. (2024). Leveraging large language models for generating responses to patient messages-a subjective analysis. JAMIA.
Published
2026-06-01
REIS, Davi; FERREIRA, Anderson A.; CUNHA, Washington; MACUL, Victor; NETO, Olivio; ALMEIDA, Jussara; ROCHA, Leonardo; GONÇALVES, Marcos André. Multimodal Summarization of Clinical Dialogues in Digital Primary Care: Integrating Text Messages and Audio. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 26. , 2026, Ouro Preto/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 1367-1372. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2026.21379.

Most read articles by the same author(s)

<< < 1 2 3 4 5 6 7 > >>