AutoIPA: development of an online platform to facilitate the use of automated phonetic transcription using Artificial Intelligence

  • Guilherme Brizzi UFSM
  • Ana Lilian Alfonso Toledo UFSM
  • Felipe Crivellaro Minuzzi UFSM

Abstract


Phonetic transcription is a useful tool for describing the variability inherent in language. However, this technique remains largely inaccessible to the general public. Against this backdrop, AutoIPA (autoipa.org) was developed: an AI-powered web platform designed to simplify access to and use of phonetic transcription in the International Phonetic Alphabet (IPA). In this vein, various pre-existing models were studied and evaluated to enable the creation of an application that automates phonetic transcription. Initially, a scarcity of labeled datasets was noted, which constrains the performance of current models. Moreover, it was observed that these tools are still not very accessible.

References

Alefiury (2024). wav2vec2-large-xlsr-53-gender-recognition-librispeech. Acesso em: maio 2025. Disponível em: [link].

Atkielski, A. (2005). Phonetic transcription can be a useful tool for teaching or correcting pronunciation in the esl/efl classroom. Using Phonetic Transcription in Class. p. 1–12.

Baevski, A., Zhou, H., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE. p. 1–5.

Bates, S., Watson, J., Heselwood, B., and Howard, S. (2024). Phonetic transcription in clinical practice. In Ball, M. J., Müller, N., and Spencer, E., editors, The Handbook of Clinical Linguistics. John Wiley & Sons. p. 471–489.

Battisti, E. (2014). Palatalização de t e d. In Bisol, L. and Battisti, E., editors, O português falado no Rio Grande do Sul. EDIPUCRS, Porto Alegre. p. 105–120.

Bhaskararao, P. (2004). Phonetic documentation of endangered languages: Creating a knowledge- base containing sound recording, transcription and analysis. Acoustical Science and Technology, 25(4). p. 219-226.

Bisol, L. (2005). Introdução a estudos de fonologia do português brasileiro. EDIPUCRS.

Bookbot (2022). Ljspeech phonemes dataset. [link]. Acesso em: maio de 2025.

Ethnologue (2025). Ethnologue: Languages of the world. Disponível em: [link]. Acesso em: abril de 2025.

Facebook (2025a). facebook/wav2vec2-lv-60-espeak-cv-ft: A fine-tuned model for speech recognition on commonvoice. Disponível em: [link]. Acesso em: abril de 2025.

Facebook (2025b). facebook/wav2vec2-xlsr-53-espeak-cv-ft: A fine-tuned model for speech recognition on commonvoice. Disponível em: [link]. Acesso em: abril de 2025.

International Phonetic Association (2020). The international phonetic alphabet (revised to 2020). [link]. Official IPA chart rendered in the TeX TIPA Roman font.

K, T. D., James, J., Gopinath, D. P., and K, M. A. (2024). Advocating character error rate for multilingual asr evaluation. Acesso em: maio de 2025.

Li, X., Dalmia, S., Li, J., Littell, P., Lee, M., Yao, J., Anastasopoulos, A., Mortensen, D., Neubig, G., Black, A., and Metze, F. (2020). Universal phone recognition with a multilingual allophone system. In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona.

O’Grady, W. (2005). Contemporary Linguistics: An Introduction. Bedford/St. Martin’s, 5th edition. p. 15.

Peterson, D. J. (2015). The Art of Language Invention: From Horse-Lords to Dark Elves to Sand Worms, the Words Behind World-Building. p. 18–23.

Pratap, V., Tjandra, A., Shi, B., Tomasello, P., Babu, A., Kundu, S., Elkahky, A., Ni, Z., Vyas, A., Fazel-Zarandi, M., et al. (2024). Scaling speech technology to 1,000+ languages. Journal of Machine Learning Research, 25(97):1–52.

Ravanelli, M., Parcollet, T., Plantinga, P., Rouhe, A., Cornell, S., Lugosch, L., Subakan, C., Dawalatabad, N., Heba, A., Zhong, J., Chou, J.-C., Yeh, S.-L., Fu, S.-W., Liao, C.-F., Rastorgueva, E., Grondin, F., Aris, W., Na, H., Gao, Y., Mori, R. D., and Bengio, Y. (2021). Speechbrain: A general-purpose speech toolkit. Acesso em: maio 2025. Disponível em: [link].

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Łukasz Kaiser, and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. p. 5998–6008.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages p. 38–45, Online. Association for Computational Linguistics.
Published
2025-09-29
BRIZZI, Guilherme; TOLEDO, Ana Lilian Alfonso; MINUZZI, Felipe Crivellaro. AutoIPA: development of an online platform to facilitate the use of automated phonetic transcription using Artificial Intelligence. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1388-1397. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.11801.