Estimating Textual Cohesion in Essays in the Context of ENEM Using Machine Learning Models
Abstract
Textual cohesion is a fundamental property of formal writing, as it relates to the harmonious connection of text elements. Although several works automatically analyze textual cohesion in essays, there are still few works for Portuguese. This work investigates the application of regression models to estimate the textual cohesion of essays written in Portuguese in the context of ENEM, adopting a set of 151 characteristics identified in the literature. Experiments using the Essay-BR corpus, composed of 4,570 ENEM-style essays, demonstrate that the Extremely Randomized Trees model achieved the best results with a moderate Pearson correlation (53.08%) related to cohesion grades.
References
Barbosa, G., Camelo, R., Cavalcanti, A. P., Miranda, P., Mello, R. F., Kovanović, V., and Gašević, D. (2020). Towards automatic cross-language classification of cognitive presence in online discussions. In Proceedings of the tenth international conference on learning analytics & knowledge, pages 605–614.
Camelo, R., Justino, S., and Mello, R. (2020). Coh-metrix pt-br: Uma api web de análise textual para a educação. In Anais dos Workshops do IX Congresso Brasileiro de Informática na Educação, pages 179–186, Porto Alegre, RS, Brasil. SBC.
Costa, L., Oliveira, E., and Júnior, A. C. (2020). Corretor automático de redações em língua portuguesa: um mapeamento sistemático de literatura. In Anais do XXXI Simpósio Brasileiro de Informática na Educação, pages 1403–1412, Porto Alegre, RS, Brasil. SBC.
Crossley, S. A., Kyle, K., and Dascalu, M. (2019). The tool for the automatic analysis of cohesion 2.0: Integrating semantic similarity and text overlap. Behavior research methods, 51(1):14–27.
Ferreira-Mello, R., André, M., Pinheiro, A., Costa, E., and Romero, C. (2019). Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6): e1332.
Filho., A., Concatto., F., Antonio do Prado., H., and Ferneda., E. (2021). Comparing feature engineering and deep learning methods for automated essay scoring of Brazilian national high school examination. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS, pages 575–583. INSTICC, SciTePress.
Filho, A. H., Concatto, F., Nau, J., do Prado, H. A., Imhof, D. O., and Ferneda, E. (2019). Imbalanced learning techniques for improving the performance of statistical models in automated essay scoring. Procedia Computer Science, 159:764–773. Knowledge-Based and Intelligent Information Engineering Systems: Proceedings of the 23rd International Conference KES2019.
Filho, A. H., do Prado, H. A., Ferneda, E., and Nau, J. (2018). An approach to evaluate adherence to the theme and the argumentative structure of essays. Procedia Computer Science, 126:788–797. Knowledge-Based and Intelligent Information Engineering Systems: Proceedings of the 22nd International Conference, KES-2018, Belgrade, Serbia.
Freund, R. J., Wilson, W. J., and Sa, P. (2006). Regression analysis. Elsevier.
Graesser, A. C., McNamara, D. S., and Kulikowich, J. M. (2011). Coh-metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5):223–234.
Junior, O. B. and Fileto, R. (2021). Investigando coerência em postagens de um fórum de dúvidas em ambiente virtual de aprendizagem com o BERT. In Anais do XXXII Simpósio Brasileiro de Informática na Educação, pages 749–759, Porto Alegre, RS, Brasil. SBC.
Kaur, H., Pannu, H. S., and Malhi, A. K. (2019). A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Comput. Surv., 52(4).
Kellogg, R. T. and Raulerson, B. A. (2007). Improving the writing skills of college students. Psychonomic Bulletin & Review, 14:237–242.
Klein, R. and Fontanive, N. (2009). Uma nova maneira de avaliar as competências escritoras na redação do enem. Ensaio: Avaliação e Políticas Públicas em Educação, 17(65):585–598.
Lapata, M. and Barzilay, R. (2005). Automatic evaluation of text coherence: Models and representations. In IJCAI, pages 1085–1090.
Lima, F., Haendchen Filho, A., Prado, H., and Ferneda, E. (2018). Automatic evaluation of textual cohesion in essays. In 19th International Conference on Computational Linguistics and Intelligent Text Processing.
Marinho, J., Anchiêta, R., and Moura, R. (2021). Essay-br: a brazilian corpus of essays. In Anais do III Dataset Showcase Workshop, pages 53–64, Porto Alegre, RS, Brasil. SBC.
Palma, D. and Atkinson, J. (2018). Coherence-based automatic essay assessment. IEEE Intelligent Systems, 33(5):26–36.
Passero, G., Ferreira, R., and Dazzi, R. L. S. (2019). Off-topic essay detection: A comparative study on the portuguese language. Revista Brasileira de Informática na Educação, 27(03):177–190.
