PATRICIA: a real-time singing synthesizer for Brazilian Portuguese
Abstract
This paper describes PATRICIA, a system that performs real-time singing voice synthesis (SVS) for the Brazilian Portuguese language. A technological mapping and a systematic review were conducted to study the latest developments in real-time SVS and to give directions for PATRICIA design and implementation. Sample-based concatenative synthesis with text files providing the song lyrics in advance was the approach chosen to perform the task. The most recent implemented functionalities are presented and future enhancements are pointed out to overcome the system’s current limitations.References
Alivizatou-Barakou et al. (2017). Intangible Cultural Heritage and New Technologies: Challenges and Opportunities for Cultural Preservation and Development, pages 129–158. Springer International Publishing, Cham.
Brum, L. A. Z. (2023). Patricia: um sintetizador de canto em tempo real para o português brasileiro. Master’s thesis, Universidade Federal de Sergipe.
Brum, L. A. Z., Meneses, E. A. L., and Moreno, E. D. (2023). Patricia: a real-time singing synthesizer prototype for the brazilian portuguese language. In Proceedings of the International Computer Music Conference 2023, page 176 – 180.
Brum, L. A. Z. and Moreno, E. D. (2019). State of art of real-time singing voice synthesis. In Anais do XVII Simpósio Brasileiro de Computação Musical, pages 50–57, Porto Alegre, RS, Brasil. SBC.
Brum, L. A. Z. and Moreno, E. D. (2020). Challenges and perspectives on real-time singing voice synthesis. Revista de Informática Teórica e Aplicada, 27(4):118–126.
Chan, P. Y., Dong, M., Ho, G. X. H., and Li, H. (2016). SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms. In Proc. Interspeech 2016, pages 1225–1229.
Delalez, S. and d’Alessandro, C. (2017). Adjusting the Frame: Biphasic Performative Control of Speech Rhythm. In Proceedings of Interspeech 2017, pages 864–868, Stockholm, Sweden.
Dong, M., Lee, S. W., Li, H., Chan, P., Peng, X., Ehnes, J. W., and Huang, D. (2014). I2r speech2singing perfects everyone’s singing. In Proc. Interspeech 2014, pages 2148–2149.
Dutoit, T., Pagel, V., Pierret, N., Bataille, F., and van der Vrecken, O. (1996). The mbrola project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, volume 3, pages 1393–1396 vol.3.
Feugère, L., d’Alessandro, C., Doval, B., and Perrotin, O. (2017). Cantor digitalis: chironomic parametric synthesis of singing. EURASIP Journal on Audio, Speech, and Music Processing, 2017(1):1–19.
Kagami, S., Hamano, K., Kashiwaze, K., and Yamamoto, K. (2012). Development of realtime japanese vocal keyboard. Information Processing Society of Japan INTERACTION, pages 837–842.
Kashiwase, K. (2017). An over-the-shoulder keyboard that extends the potential for vocaloid performance. Yamaha Corporation. Accessed: 2023-01-29.
Kenmochi, H. and Ohshita, H. (2007). VOCALOID - commercial singing synthesizer based on sample concatenation. In Proc. Interspeech 2007, pages 4009–4010.
Kim, Y. E. (2008). Singing Voice Analysis, Synthesis, and Modeling, pages 359–374. Springer New York, New York, NY.
Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, UK, Keele University, 33(2004):1–26.
Kubozono, H. (1989). The mora and syllable structure in japanese: Evidence from speech errors. Language and Speech, 32(3):249–278.
Le Beux, S., Feugère, L., and d’Alessandro, C. (2011). Chorus digitalis: experiment in chironomic choir singing. In Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), pages 2005–2008. ISCA.
Locqueville, G., d’Alessandro, C., Delalez, S., Doval, B., and Xiao, X. (2020). Voks: Digital instruments for chironomic control of voice samples. Speech Communication, 125:97–113.
Matsubara, K., Okamoto, T., Takashima, R., Takiguchi, T., Toda, T., Shiga, Y., and Kawai, H. (2021). Full-band lpcnet: A real-time neural vocoder for 48 khz audio with a cpu. IEEE Access, 9:94923–94933.
MIDI-Manufacturers-Association et al. (1996). The complete midi 1.0 detailed specification. Los Angeles, CA, The MIDI Manufacturers Association.
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., and Group, T. P. (2009). Preferred reporting items for systematic reviews and meta-analyses: The prisma statement. PLOS Medicine, 6(7):1–6.
Oura, K., Mase, A., Yamada, T., Muto, S., Nankaku, Y., and Tokuda, K. (2010). Recent development of the hmm-based singing voice synthesis system—sinsy. In Seventh ISCA Workshop on Speech Synthesis.
Petersen, K., Feldt, R., Mujtaba, S., and Mattsson, M. (2008). Systematic mapping studies in software engineering. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, EASE’08, page 68–77, Swindon, GBR. BCS Learning Development Ltd.
Rodet, X., Potard, Y., and Barriere, J.-B. (1984). The chant project: From the synthesis of the singing voice to synthesis in general. Computer Music Journal, 8(3):15–31.
Tae, J., Kim, H., and Lee, Y. (2021). Mlp singer: Towards rapid parallel korean singing voice synthesis. In 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6.
Tan, X. (2023). Beyond Text-to-Speech Synthesis, pages 175–179. Springer Nature Singapore, Singapore.
Veaux, C., Astrinaki, M., Oura, K., Clark, R. A. J., and Yamagishi, J. (2013). Gesture control of hmm-based singing voice synthesis. In Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), pages 247–248.
Wells, J. C. et al. (1997). Sampa computer readable phonetic alphabet. Handbook of standards and resources for spoken language systems, 4:684–732.
Brum, L. A. Z. (2023). Patricia: um sintetizador de canto em tempo real para o português brasileiro. Master’s thesis, Universidade Federal de Sergipe.
Brum, L. A. Z., Meneses, E. A. L., and Moreno, E. D. (2023). Patricia: a real-time singing synthesizer prototype for the brazilian portuguese language. In Proceedings of the International Computer Music Conference 2023, page 176 – 180.
Brum, L. A. Z. and Moreno, E. D. (2019). State of art of real-time singing voice synthesis. In Anais do XVII Simpósio Brasileiro de Computação Musical, pages 50–57, Porto Alegre, RS, Brasil. SBC.
Brum, L. A. Z. and Moreno, E. D. (2020). Challenges and perspectives on real-time singing voice synthesis. Revista de Informática Teórica e Aplicada, 27(4):118–126.
Chan, P. Y., Dong, M., Ho, G. X. H., and Li, H. (2016). SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms. In Proc. Interspeech 2016, pages 1225–1229.
Delalez, S. and d’Alessandro, C. (2017). Adjusting the Frame: Biphasic Performative Control of Speech Rhythm. In Proceedings of Interspeech 2017, pages 864–868, Stockholm, Sweden.
Dong, M., Lee, S. W., Li, H., Chan, P., Peng, X., Ehnes, J. W., and Huang, D. (2014). I2r speech2singing perfects everyone’s singing. In Proc. Interspeech 2014, pages 2148–2149.
Dutoit, T., Pagel, V., Pierret, N., Bataille, F., and van der Vrecken, O. (1996). The mbrola project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, volume 3, pages 1393–1396 vol.3.
Feugère, L., d’Alessandro, C., Doval, B., and Perrotin, O. (2017). Cantor digitalis: chironomic parametric synthesis of singing. EURASIP Journal on Audio, Speech, and Music Processing, 2017(1):1–19.
Kagami, S., Hamano, K., Kashiwaze, K., and Yamamoto, K. (2012). Development of realtime japanese vocal keyboard. Information Processing Society of Japan INTERACTION, pages 837–842.
Kashiwase, K. (2017). An over-the-shoulder keyboard that extends the potential for vocaloid performance. Yamaha Corporation. Accessed: 2023-01-29.
Kenmochi, H. and Ohshita, H. (2007). VOCALOID - commercial singing synthesizer based on sample concatenation. In Proc. Interspeech 2007, pages 4009–4010.
Kim, Y. E. (2008). Singing Voice Analysis, Synthesis, and Modeling, pages 359–374. Springer New York, New York, NY.
Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, UK, Keele University, 33(2004):1–26.
Kubozono, H. (1989). The mora and syllable structure in japanese: Evidence from speech errors. Language and Speech, 32(3):249–278.
Le Beux, S., Feugère, L., and d’Alessandro, C. (2011). Chorus digitalis: experiment in chironomic choir singing. In Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), pages 2005–2008. ISCA.
Locqueville, G., d’Alessandro, C., Delalez, S., Doval, B., and Xiao, X. (2020). Voks: Digital instruments for chironomic control of voice samples. Speech Communication, 125:97–113.
Matsubara, K., Okamoto, T., Takashima, R., Takiguchi, T., Toda, T., Shiga, Y., and Kawai, H. (2021). Full-band lpcnet: A real-time neural vocoder for 48 khz audio with a cpu. IEEE Access, 9:94923–94933.
MIDI-Manufacturers-Association et al. (1996). The complete midi 1.0 detailed specification. Los Angeles, CA, The MIDI Manufacturers Association.
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., and Group, T. P. (2009). Preferred reporting items for systematic reviews and meta-analyses: The prisma statement. PLOS Medicine, 6(7):1–6.
Oura, K., Mase, A., Yamada, T., Muto, S., Nankaku, Y., and Tokuda, K. (2010). Recent development of the hmm-based singing voice synthesis system—sinsy. In Seventh ISCA Workshop on Speech Synthesis.
Petersen, K., Feldt, R., Mujtaba, S., and Mattsson, M. (2008). Systematic mapping studies in software engineering. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, EASE’08, page 68–77, Swindon, GBR. BCS Learning Development Ltd.
Rodet, X., Potard, Y., and Barriere, J.-B. (1984). The chant project: From the synthesis of the singing voice to synthesis in general. Computer Music Journal, 8(3):15–31.
Tae, J., Kim, H., and Lee, Y. (2021). Mlp singer: Towards rapid parallel korean singing voice synthesis. In 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6.
Tan, X. (2023). Beyond Text-to-Speech Synthesis, pages 175–179. Springer Nature Singapore, Singapore.
Veaux, C., Astrinaki, M., Oura, K., Clark, R. A. J., and Yamagishi, J. (2013). Gesture control of hmm-based singing voice synthesis. In Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), pages 247–248.
Wells, J. C. et al. (1997). Sampa computer readable phonetic alphabet. Handbook of standards and resources for spoken language systems, 4:684–732.
Published
2025-08-12
How to Cite
BRUM, Leonardo A. Z.; MENESES, Eduardo A. L.; MORENO, Edward D..
PATRICIA: a real-time singing synthesizer for Brazilian Portuguese. In: REGIONAL SCHOOL ON COMPUTING OF BAHIA, ALAGOAS, AND SERGIPE (ERBASE), 25. , 2025, Lagarto/SE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 162-171.
DOI: https://doi.org/10.5753/erbase.2025.13661.
