Automatic Punctuation Verification of School Students’ Essay in Portuguese
Resumo
Textual production is a key activity at different levels of education. The analysis of essays encompasses several criteria, such as lexical and syntactic errors, cohesion, and coherence. Within these criteria, how the students include punctuation (i.e., final mark and comma) could influence the quality of the final production. Thus, the literature has proposed several approaches to verifying punctuation correction in students’ essays for English. However, despite the advancements in natural language processing models for other languages, there is a significant gap concerning punctuation verification. Therefore, this paper proposed a new approach based on state-of-the-art language models to develop a punctuation prediction method for Portuguese. The proposed model was applied to evaluate the textual productions of students in Brazilian public schools. Finally, the results of this study and its practical implications for educational settings are further discussed.
Referências
Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th international conference on computational linguistics, pages 1638–1649.
Awad, A. (2012). The most common punctuation errors made by the english and the tefl majors at an-najah national university. . Vol., 26:23.
Carmo, D., Piau, M., Campiotti, I., Nogueira, R., and Lotufo, R. (2020). Ptt5: Pre-training and validating the t5 model on brazilian portuguese data. arXiv preprint arXiv:2008.09144.
Cavaleri, M. R. and Dianati, S. (2016). You want me to check your grammar again? the usefulness of an online grammar checker as perceived by students. Journal of Academic Language and Learning, 10(1):A223–A236.
Courtland, M., Faulkner, A., and McElvain, G. (2020). Efficient automatic punctuation restoration using bidirectional transformers with robust inference. In Proceedings of the 17th International Conference on Spoken Language Translation, pages 272–279.
Devlin, J. (2018). Multilingual bert readme document.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
Federico, M., Cettolo, M., Bentivogli, L., Michael, P., and Sebastian, S. (2012). Overview of the iwslt 2012 evaluation campaign. In Proceedings of the international workshop on spoken language translation (IWSLT), pages 12–33.
Garg, A. and Agarwal, M. (2018). Machine translation: a literature review. arXiv preprint arXiv:1901.01122.
Gazzola, M. G., Leal, S. E., and Aluísio, S. M. (2019). Predição da complexidade textual de recursos educacionais abertos em português. In Symposium in Information and Human Language Technology STIL. SBC.
He, X. (2009). A web-based intelligent tutoring system for english dictation. In 2009 International Conference on Artificial Intelligence and Computational Intelligence, volume 4, pages 583–586.
Hentschel, M., Tsunoo, E., and Okuda, T. (2021). Making Punctuation Restoration Robust and Fast with Multi-Task Learning and Knowledge Distillation. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7773–7777. ISSN: 2379-190X.
Im, H.-J. (2021). The use of an online grammar checker in english writing learning. Journal of Digital Convergence, 19(1):51–58.
Jawahar, G., Sagot, B., and Seddah, D. (2019). What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3651–3657, Florence, Italy. Association for Computational Linguistics.
Kinoshita, J., Salvador, L. d. N., and de Menezes, C. E. D. (2006). CoGrOO: a Brazilian-Portuguese grammar checker based on the CETENFOLHA corpus. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Klejch, O., Bell, P., and Renals, S. (2016). Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches. In 2016 IEEE Spoken Language Technology Workshop (SLT), pages 433–440. IEEE.
Kundu, S. (2021). AI in medicine must be explainable. Nature Medicine, 27(8):1328–1328.
Kurup, L., Joshi, A., and Shekhokar, N. (2016a). Intelligent Tutoring System for learning English punctuation. In 2016 International Conference on Computing Communication Control and automation (ICCUBEA), pages 1–6, Pune, India. IEEE.
Kurup, L., Joshi, A., and Shekhokar, N. (2016b). Intelligent tutoring system for learning english punctuation. In 2016 International Conference on Computing Communication Control and automation (ICCUBEA), pages 1–6. IEEE.
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. biometrics, pages 159–174.
Lima, T. B. D., Miranda, P., Mello, R. F., Wenceslau, M., Bittencourt, I. I., Cordeiro, T. D., and José, J. (2022). Sequence labeling algorithms for punctuation restoration in brazilian portuguese texts. In Intelligent Systems: 11th Brazilian Conference, BRACIS 2022, Campinas, Brazil, November 28–December 1, 2022, Proceedings, Part II, pages 616–630. Springer.
Makhija, K., Ho, T.-N., and Chng, E.-S. (2019). Transfer learning for punctuation prediction. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 268–273. IEEE.
Nagata, R. and Nakatani, K. (2010). Evaluating performance of grammatical error detection to maximize learning effect. In Coling 2010: Posters, pages 894–900, Beijing, China. Coling 2010 Organizing Committee.
Nagy, A., Bial, B., and Ács, J. (2021). Automatic punctuation restoration with BERT models.
ONeill, R. and Russell, A. (2019). Stop! grammar time: University students’ perceptions of the automated feedback program grammarly. Australasian Journal of Educational Technology, 35(1).
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
Perrotta, C. and Selwyn, N. (2020). Deep learning goes to school: Toward a relational understanding of ai in education. Learning, Media and Technology, 45(3):251–269.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P. J., et al. (2020). Exploring the limits of transfer learning with a unified text-totext transformer. J. Mach. Learn. Res., 21(140):1–67.
Ramshaw, L. and Marcus, M. (1995). Text chunking using transformation-based learning. In Third Workshop on Very Large Corpora.
Sahami, M., desJardins, M., Dodds, Z., and Neller, T. (2011). Educational advances in artificial intelligence. In Proceedings of the 42nd ACM technical symposium on Computer science education, SIGCSE ’11, pages 81–82, New York, NY, USA. Association for Computing Machinery.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian conference on intelligent systems, pages 403–417. Springer.
Suliman, F., Ben-Ahmeida, M., and Mahalla, S. (2019). Importance of Punctuation Marks for Writing and Reading Comprehension Skills. (Faculty of Arts Journal) , (13):29–53.
Tilk, O. and Alumäe, T. (2016). Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In Interspeech, pages 3047–3051.
Vāravs, A. and Salimbajevs, A. (2018). Restoring Punctuation and Capitalization Using Transformer Models. In Dutoit, T., Martín-Vide, C., and Pironkov, G., editors, Statistical Language and Speech Processing, volume 11171, pages 91–102. Springer International Publishing, Cham. Series Title: Lecture Notes in Computer Science.
Wilson, J. and Roscoe, R. D. (2020). Automated writing evaluation and feedback: Multiple metrics of efficacy. Journal of Educational Computing Research, 58(1):87–125.