Automated Formal Register Scoring of Student Narrative Essays Written in Portuguese
Resumo
Automated essay scoring (AES) is the task of automatically assigning scores (i.e., grades) to written texts. Although AES has been widely studied in the literature (e.g., informational and argumentative essays), specific types of texts still need more attention. Narrative essays are characterized by texts describing personal experiences and stories, either real or fictional. In this work, we describe a study on scoring student essays written in Portuguese under the aspect of Formal Register, which evaluates aspects related to the use of Brazilian Portuguese formal grammar and proficiency. The dataset created in this study provides a rich corpus of narrative essays produced in the context of a motivational situation, with a diverse set of language proficiency levels annotated by two professional graders. Different machine learning algorithms were evaluated using a diverse set of handcrafted linguistic features, and their results were compared against manual scores by the two human annotators. The results of the proposed analysis demonstrated that the AES model proposed achieved an equivalent agreement to that of the two human annotators.
Referências
Amorim, E. and Veloso, A. (2017). A multi-aspect analysis of automatic essay scoring for Brazilian Portuguese. In Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 94–102, Valencia, Spain. Association for Computational Linguistics.
Bai, X. and Stede, M. (2022). A survey of current machine learning approaches to student free-text evaluation for intelligent tutoring. International Journal of Artificial Intelligence in Education, pages 1–39.
Batista, H. H., Barbosa, G. A., Miranda, P., Santos, J., Isotani, S., Cordeiro, T., Bittencourt, I. I., and Mello, R. F. (2022). Detecção automática de clímax em produções de textos narrativos. In Anais do XXXIII Simpósio Brasileiro de Informática na Educação, pages 932–943. SBC.
Cavalcanti, A. P., Barbosa, A., Carvalho, R., Freitas, F., Tsai, Y.-S., Gašević, D., and Mello, R. F. (2021a). Automatic feedback in online learning environments: A systematic literature review. Computers and Education: Artificial Intelligence, 2:100027.
Cavalcanti, A. P., Mello, R. F., Miranda, P., Nascimento, A., and Freitas, F. (2021b). Utilização de recursos linguísticos para classificação automática de mensagens de feedback. In Anais do XXXII Simpósio Brasileiro de Informática na Educaçao, pages 861–872. SBC.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794.
Coelho, R. (2020). Teaching writing in brazilian public high schools. Reading and Writing, 33(6):1477–1529.
Crossley, S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3):415–443.
de Lima, T. B., da Silva, I. L. A., Freitas, E. L. S. X., and Mello, R. F. (2023). Avaliação automática de redação: Uma revisão sistemática. Revista Brasileira de Informática na Educação, 31:205–221.
Etoori, P., Chinnakotla, M., and Mamidi, R. (2018). Automatic spelling correction for resource-scarce languages using deep learning. In Proceedings of ACL 2018, Student Research Workshop, pages 146–152, Melbourne, Australia. Association for Computational Linguistics.
Ferreira-Mello, R., André, M., Pinheiro, A., Costa, E., and Romero, C. (2019). Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6):e1332.
Fonseca, E., Medeiros, I., Kamikawachi, D., and Bokan, A. (2018). Automatically grading brazilian student essays. In Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gonçalo Oliveira, H., and Paetzold, G. H., editors, Computational Processing of the Portuguese Language, pages 170–179, Cham. Springer International Publishing.
Gimenes, P. A., Roman, N. T., and Carvalho, A. M. (2015). Spelling error patterns in brazilian portuguese. Computational Linguistics, 41(1):175–183.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., and Cai, Z. (2004). Coh-metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2):193–202.
Hládek, D., Staš, J., and Pleva, M. (2020). Survey of automatic spelling correction. Electronics, 9(10).
Honnibal, M., Montani, I., Van Landeghem, S., and Boyd, A. (2020). spaCy: Industrialstrength Natural Language Processing in Python.
Hossin, M. and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2):1.
Iqbal, S., Rakovic, M., Chen, G., Li, T., Ferreira Mello, R., Fan, Y., Fiorentino, G., Radi Aljohani, N., and Gasevic, D. (2023). Towards automated analysis of rhetorical categories in students essay writings using bloom’s taxonomy. In LAK23: 13th International Learning Analytics and Knowledge Conference, pages 418–429.
Jones, S., Fox, C., Gillam, S., and Gillam, R. B. (2019). An exploration of automated narrative analysis via machine learning. Plos one, 14(10):e0224634.
Ke, Z. and Ng, V. (2019). Automated essay scoring: A survey of the state of the art. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 6300–6308. International Joint Conferences on Artificial Intelligence Organization.
Kinoshita, J., Salvador, L. d. N., and de Menezes, C. E. D. (2006). CoGrOO: a Brazilian-Portuguese grammar checker based on the CETENFOLHA corpus. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics.
Lima, T. B. D., Miranda, P., Mello, R. F., Wenceslau, M., Bittencourt, I. I., Cordeiro, T. D., and José, J. (2022). Sequence labeling algorithms for punctuation restoration in brazilian portuguese texts. In Xavier-Junior, J. C. and Rios, R. A., editors, Intelligent Systems, pages 616–630, Cham. Springer International Publishing.
Llach, M. d. P. A. (2011). Lexical errors and accuracy in foreign language writing. Multilingual Matters.
Marcilese, M., Name, C., Augusto, M., Molina, D., and Armando, R. (2019). Mother-tongue education, linguistic variation and language processing. Ilha do Desterro A Journal of English Language, Literatures in English and Cultural Studies, 72(3):17–40.
Mello, R. F., Fiorentino, G., Miranda, P., Oliveira, H., Raković, M., and Gašević, D. (2021). Towards automatic content analysis of rhetorical structure in brazilian college entrance essays. In International Conference on Artificial Intelligence in Education, pages 162–167. Springer.
Nau, J., Dazzi, R. L., Filho, A. H., and Fernandes, A. (2020). Processamento do discurso em textos dissertativos-argumentativos: Uma abordagem baseada em mineração de argumentos e aprendizado supervisionado de máquina. In Anais do XLVII Seminário Integrado de Software e Hardware, pages 48–59, Porto Alegre, RS, Brasil. SBC.
Oliveira, H., Ferreira Mello, R., Barreiros Rosa, B. A., Rakovic, M., Miranda, P., Cordeiro, T., Isotani, S., Bittencourt, I., and Gasevic, D. (2023). Towards explainable prediction of essay cohesion in portuguese and english. In LAK23: 13th International Learning Analytics and Knowledge Conference, pages 509–519.
Ramesh, D. and Sanampudi, S. K. (2022). An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review, 55(3):2495–2527.
Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. CoRR, abs/1811.12808.
Segaran, T. and Hammerbacher, J. (2009). Beautiful data. O’Reilly Media, Sebastopol, CA.
Shin, J. and Gierl, M. J. (2021). More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing, 38(2):247–272.
Somasundaran, S., Flor, M., Chodorow, M., Molloy, H., Gyawali, B., and McCulla, L. (2018). Towards evaluating narrative quality in student writing. Transactions of the Association for Computational Linguistics, 6:91–106.
Uto, M., Xie, Y., and Ueno, M. (2020). Neural automated essay scoring incorporating handcrafted features. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6077–6088, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Yuan, Z., Jiang, Y., Li, J., and Huang, H. (2020). Hybrid-dnns: Hybrid deep neural networks for mixed inputs. arXiv preprint arXiv:2005.08419.