The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech

  • Julio Galdino USP
  • Sidney Leal USP / Venturus
  • Leticia de Souza Unesp
  • Rodrigo Lima USP
  • Antonio Moreira USP
  • Arnaldo Candido Jr. Unesp
  • Miguel Oliveira Jr. UFAL
  • Edresson Casanova NVIDIA Corporation
  • Sandra Aluísio USP

Resumo


Spontaneous speech presents several challenges for speech synthesis, particularly in capturing the natural flow of conversation, including turn-taking, pauses, and disfluencies. Although speech synthesis systems have made significant progress in generating natural and intelligible speech, primarily through architectures that implicitly model prosodic features, such as pitch, intensity, and duration, the construction of datasets with explicit prosodic segmentation and their impact on spontaneous speech synthesis remains largely unexplored. This paper evaluates the effects of manual and automatic prosodic segmentation annotations in Brazilian Portuguese on the quality of speech synthesized by a non-autoregressive model, FastSpeech 2. Experimental results show that training with prosodic segmentation produced slightly more intelligible and acoustically natural speech. Although automatic segmentation tends to create more regular segments, manual prosodic segmentation introduces greater variability, which contributes to more natural prosody. Analysis of neutral declarative utterances showed that both training approaches reproduced the expected nuclear accent pattern, but the prosodic model aligned more closely with natural pre-nuclear contours. To support reproducibility and future research, all datasets, source codes, and trained models are publicly available under the CC BY-NC-ND 4.0 license.
Publicado
29/09/2025
GALDINO, Julio et al. The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 547-561. ISSN 2643-6264.