Machine Learning Classifiers with Acoustic Features for Prosodic Segmentation in Brazilian Portuguese: A Comprehensive Evaluation
Abstract
Spontaneous speech has not yet been widely explored in Brazilian Portuguese (BP) for the task of automatic prosodic segmentation. In this article, we compared seven types of classifiers, considering their performance for various types of speaker profiles (varied genders, ages, education levels, and regions of birth) and environmental impact, and trained the most appropriate one. Thus, we propose a Random Forest classifier, based on acoustic features, with low environmental impact and an F1 score of 0.55% and 0.77%, with binary and macro averages, respectively. Additionally, we are making it publicly available and present a discussion of its efficiency for different speaker profiles, as well as its environmental impact.References
Ananthakrishnan, S. and Narayanan, S. S. (2008). Automatic prosodic event detection using acoustic, lexical, and syntactic evidence. IEEE Transactions on Audio, Speech, and Language Processing, 16(1):216–228.
Bäckström, T., Räsänen, O., Zewoudie, A., Perez Zarazaga, P., Das, S., et al. (2020). Introduction to speech processing. Library of Open Educational Resources.
Batista, C., Dias, A. L., and Neto, N. (2022). Free resources for forced phonetic alignment in brazilian portuguese based on kaldi toolkit. EURASIP Journal on Advances in Signal Processing, 2022(1):11.
Berners-Lee, M. (2020). How bad are bananas?: the carbon footprint of everything. Profile Books.
Biron, T., Baum, D., Freche, D., Matalon, N., Ehrmann, N., Weinreb, E., Biron, D., and Moses, E. (2021). Automatic detection of prosodic boundaries in spontaneous speech. PLoS ONE, 16(5):1–21.
Boersma, P. and Weenink, D. (2025). Praat: doing phonetics by computer [Computer program]. Version 2025.
Chen, K. and Hasegawa-Johnson, M. A. (2004). How prosody improves word recognition. In Speech Prosody 2004.
Craveiro, G. M. and Galdino, J. C. (2025). Diversity in data for speech processing in brazilian portuguese. In Paes, A. and Verri, F. A. N., editors, Intelligent Systems, pages 122–136, Cham. Springer Nature Switzerland.
Craveiro, G. M., Santos, V. G., Dalalana, G. J. P., Svartman, F. R. F., and Aluísio, S. M. (2024). Simple and fast automatic prosodic segmentation of Brazilian Portuguese spontaneous speech. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese Vol. 1, pages 32–44, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
Hoi, L. M., Sun, Y., and Im, S. K. (2022). An automatic speech segmentation algorithm of portuguese based on spectrogram windowing. In 2022 IEEE World AI IoT Congress (AIIoT), pages 290–295.
Kocharov, D., Kachkovskaia, T., and Skrelin, P. (2017). Eliciting Meaningful Units from Speech. In Proc. Interspeech 2017, pages 2128–2132.
Lin, C.-H., You, C.-L., Chiang, C.-Y., Wang, Y.-R., and Chen, S.-H. (2019). Hierarchical prosody modeling for Mandarin spontaneous speech. The Journal of the Acoustical Society of America, 145(4):2576–2596.
Liu, S., Nakajima, Y., Chen, L., Arndt, S., Kakizoe, M., Elliott, M. A., and Remijn, G. B. (2022). How pause duration influences impressions of english speech: Comparison between native and non-native speakers. Frontiers in Psychology, 13.
Raso, T. and Mello, H. (2012). C-ORAL–BRASIL I: corpus de referência do português brasileiro falado informal. Editora UFMG, Belo Horizonte. 332 p. : il + 1 DVD-ROM.
Raso, T., Teixeira, B., and Barbosa, P. (2020). Modelling automatic detection of prosodic boundaries for Brazilian Portuguese spontaneous speech. Journal of Speech Sciences, 9:105–128.
Roll, N., Graham, C., and Todd, S. (2023). Psst! prosodic speech segmentation with transformers.
Serra, C. R. (2009). Realização e percepção de fronteiras prosódicas no português do Brasil: fala espontânea e leitura. PhD thesis, Federal University of Rio de Janeiro.
Viola, I. C. and Madureira, S. (2008). The roles of pause in speech expression. In Speech Prosody 2008, pages 721–724.
Bäckström, T., Räsänen, O., Zewoudie, A., Perez Zarazaga, P., Das, S., et al. (2020). Introduction to speech processing. Library of Open Educational Resources.
Batista, C., Dias, A. L., and Neto, N. (2022). Free resources for forced phonetic alignment in brazilian portuguese based on kaldi toolkit. EURASIP Journal on Advances in Signal Processing, 2022(1):11.
Berners-Lee, M. (2020). How bad are bananas?: the carbon footprint of everything. Profile Books.
Biron, T., Baum, D., Freche, D., Matalon, N., Ehrmann, N., Weinreb, E., Biron, D., and Moses, E. (2021). Automatic detection of prosodic boundaries in spontaneous speech. PLoS ONE, 16(5):1–21.
Boersma, P. and Weenink, D. (2025). Praat: doing phonetics by computer [Computer program]. Version 2025.
Chen, K. and Hasegawa-Johnson, M. A. (2004). How prosody improves word recognition. In Speech Prosody 2004.
Craveiro, G. M. and Galdino, J. C. (2025). Diversity in data for speech processing in brazilian portuguese. In Paes, A. and Verri, F. A. N., editors, Intelligent Systems, pages 122–136, Cham. Springer Nature Switzerland.
Craveiro, G. M., Santos, V. G., Dalalana, G. J. P., Svartman, F. R. F., and Aluísio, S. M. (2024). Simple and fast automatic prosodic segmentation of Brazilian Portuguese spontaneous speech. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese Vol. 1, pages 32–44, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
Hoi, L. M., Sun, Y., and Im, S. K. (2022). An automatic speech segmentation algorithm of portuguese based on spectrogram windowing. In 2022 IEEE World AI IoT Congress (AIIoT), pages 290–295.
Kocharov, D., Kachkovskaia, T., and Skrelin, P. (2017). Eliciting Meaningful Units from Speech. In Proc. Interspeech 2017, pages 2128–2132.
Lin, C.-H., You, C.-L., Chiang, C.-Y., Wang, Y.-R., and Chen, S.-H. (2019). Hierarchical prosody modeling for Mandarin spontaneous speech. The Journal of the Acoustical Society of America, 145(4):2576–2596.
Liu, S., Nakajima, Y., Chen, L., Arndt, S., Kakizoe, M., Elliott, M. A., and Remijn, G. B. (2022). How pause duration influences impressions of english speech: Comparison between native and non-native speakers. Frontiers in Psychology, 13.
Raso, T. and Mello, H. (2012). C-ORAL–BRASIL I: corpus de referência do português brasileiro falado informal. Editora UFMG, Belo Horizonte. 332 p. : il + 1 DVD-ROM.
Raso, T., Teixeira, B., and Barbosa, P. (2020). Modelling automatic detection of prosodic boundaries for Brazilian Portuguese spontaneous speech. Journal of Speech Sciences, 9:105–128.
Roll, N., Graham, C., and Todd, S. (2023). Psst! prosodic speech segmentation with transformers.
Serra, C. R. (2009). Realização e percepção de fronteiras prosódicas no português do Brasil: fala espontânea e leitura. PhD thesis, Federal University of Rio de Janeiro.
Viola, I. C. and Madureira, S. (2008). The roles of pause in speech expression. In Speech Prosody 2008, pages 721–724.
Published
2025-09-29
How to Cite
CRAVEIRO, Giovana M.; ALVES, Caroline A.; SVARTMAN, Flaviane; ALUÍSIO, Sandra M..
Machine Learning Classifiers with Acoustic Features for Prosodic Segmentation in Brazilian Portuguese: A Comprehensive Evaluation. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 113-124.
DOI: https://doi.org/10.5753/stil.2025.37818.
