Finite-State Transducers for Oral Spelling Detection

  • Gabriel J. R. Soares UFJF
  • José E. C. Silva UFJF
  • Jairo F. de Souza UFJF

Resumo


Reading fluency assessment plays a central role in early education systems worldwide. Countries such as the United States and Brazil administer large-scale oral reading assessments to monitor educational outcomes and guide intervention. However, most of the automatic assessments are often coarse in granularity. As a result, they are poorly equipped to handle children who do not yet decode words fluently and instead rely on spelling out individual letters or syllables. We show that finite-state transducers can be used to detect spelling to improve oral reading assessments. We demonstrate the effectiveness of our method on a corpus of annotated child speech, showing that it provides insight into early decoding strategies.

Referências

Ávila, C. R. B. d., Kida, A. d. S. B., Carvalho, C. A. F. d., and Paolucci, J. F. (2009). Tipologia de erros de leitura de escolares brasileiros considerados bons leitores. Pro. Fono., 21(4):320–325.

Baevski, A., Zhou, H., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations.

Block Medin, L., Pellegrini, T., and Gelin, L. (2024). Self-supervised models for phoneme recognition: Applications in children’s speech for reading learning. In Interspeech 2024, pages 5168–5172, ISCA. ISCA.

Conneau, A., Baevski, A., Collobert, R., Mohamed, A., and Auli, M. (2020). Unsupervised cross-lingual representation learning for speech recognition.

Ferreira, A. L., Silva, C., de Assis, E., and de Souza, J. (2022). Avaliação de modelos para reconhecimento automático de fala aplicados para identificação da qualidade de leituras em voz alta de narrativas breves. In Anais do XXXIII Simpósio Brasileiro de Informática na Educação, pages 895–907, Porto Alegre, RS, Brasil. SBC.

Gao, L., Tejedor-Garcia, C., Strik, H., and Cucchiarini, C. (2024). Reading miscue detection in primary school through automatic speech recognition. In Interspeech 2024, pages 5153–5157, ISCA. ISCA.

Glover, T. A. and Albers, C. A. (2007). Considerations for evaluating universal screening assessments. Journal of School Psychology, 45(2):117–135. Universal Screening for Enhanced Educational and Mental Health Outcomes.

Gough, P. B. and Tunmer, W. E. (1986). Decoding, reading, and reading disability. Remedial and Special Education, 7(1):6–10.

Graves, A., Fernández, S., Gomez, F., and Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, page 369–376, New York, NY, USA. Association for Computing Machinery.

Guo, C., Lian, J., Zhou, X., Zhang, J., Li, S., Ye, Z., Park, H. J., Das, A., Ezzes, Z., Vonk, J., Morin, B., Bogley, R., Wauters, L., Miller, Z., Gorno-Tempini, M., and Anumanchipalli, G. (2025). Dysfluent wfst: A framework for zero-shot speech dysfluency transcription and detection.

Hulme, C. and Snowling, M. J. (2013). Learning to read: What we know and what we need to understand better. Child Development Perspectives, 7(1):1–5.

Jain, R., Barcovschi, A., Yiwere, M. Y., Bigioi, D., Corcoran, P., and Cucu, H. (2023). A wav2vec2-based experimental study on self-supervised learning methods to improve child speech recognition. IEEE Access, 11:46938–46948.

Kouzelis, T., Paraskevopoulos, G., Katsamanis, A., and Katsouros, V. (2023). Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling.

Luiz Gonzaga and Zé Dantas (1987). Abc do sertão.

Luna, A. S., Machado-Lima, A., and Nunes, F. L. S. (2025). Identification and classification of speech disfluencies: A systematic review on methods, databases, tools, evaluation and challenges. Journal of the Brazilian Computer Society, 31(1):154–173.

Mohri, M., Pereira, F., and Riley, M. (2008). Speech Recognition with Weighted Finite-State Transducers, pages 559–584. Springer Berlin Heidelberg, Berlin, Heidelberg.

Montoya Gomez, G. M., Ghesquiere, P., and Van hamme, H. (2025). Reading proficiency assessment using finite-state transducers. In Proceedings of the 2024 16th International Conference on Education Technology and Computers, ICETC ’24, page 332–338, New York, NY, USA. Association for Computing Machinery.

Neubig, G., Akita, Y., Mori, S., and Kawahara, T. (2012). A monotonic statistical machine translation approach to speaking style transformation. Computer Speech Language, 26(5):349–370.

Nicolao, M., Sanders, M., and Hain, T. (2018). Improved acoustic modelling for automatic literacy assessment of children. In Interspeech 2018, pages 1666–1670.

Rocha, C., Mello, R., and Souza, J. (2024). Avaliação de fluência leitora em língua portuguesa: primeira experiência com uso em larga escala de inteligência artificial. In Anais do XXXV Simpósio Brasileiro de Informática na Educação, pages 3075–3084, Porto Alegre, RS, Brasil. SBC.

Silva, C., Ferreira, A. L., de Assis, E., and de Souza, J. (2022). Definição de heurística para identificação automática da fluência em leitura de crianças em fase de alfabetização. In Anais do XXXIII Simpósio Brasileiro de Informática na Educação, pages 39–50, Porto Alegre, RS, Brasil. SBC.

Stanovich, K. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21:360–407.

Yılmaz, E., Pelemans, J., and Van hamme, H. (2014). Automatic assessment of children’s reading with the flavor decoding using a phone confusion model.
Publicado
24/11/2025
SOARES, Gabriel J. R.; SILVA, José E. C.; SOUZA, Jairo F. de. Finite-State Transducers for Oral Spelling Detection. In: SIMPÓSIO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (SBIE), 36. , 2025, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 263-275. DOI: https://doi.org/10.5753/sbie.2025.12213.