Automatic evaluation of essays in Portuguese based on feature collection and machine learning

  • Silverio Sirotheau Corrêa Neto Federal University of Pará https://orcid.org/0000-0002-5075-1975
  • Elói Luiz Favero Federal University of Pará
  • João Carlos Alves dos Santos Federal University of Pará
  • Simone Negrão de Freitas Federal University of Pará
  • Marco Aurélio Nascimento Júnior Nascimento

Abstract


Virtual environments demand automatic evaluation methods for discursive questions. In the literature we find promising methods for texts in the English language, however, for Portuguese the studies are only preliminary. This research focuses on an approach of automatic evaluation of essays in Portuguese, based on the collection of features and the use of machine learning methods. In the experiments, 1000 essays from a public tender were used. In the collection of features, four dimensions were explored: Lexical, Syntactic, Content, and Coherence. As a result, we obtained the Kappa Square indexes (KQ) of 0.68 on the system against humans, versus a KQ of 0.56 on human against human
Keywords: Automatic evaluation, features, machine learning

References

Alencar, L. F. de. (2010) Aelius: uma ferramenta para anotação automática de corpora usando NLTK. IX Encontro de Linguística de Corpus. Porto Alegre, PUCRS.


Amália, A. et al. (2019) Automated Bahasa Indonesia essay evaluation with latent semantic analysis. In: Journal of Physics: Conference Series. IOP Publishing.


Amorim, E. e Veloso, A. (2017). multi-aspect analysis of automatic essay scoring for Brazilian Portuguese. In Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics (pp. 94-102).


Attali, Y. e Burstein, J. (2006). Automated essay scoring with e-rater® V. 2. The Journal of Technology, Learning and Assessment, 4(3).


Bull, J. e Mckenna, C. (2001) A Blueprint for Computer Assisted Assessment. Taylor & Francis Editora.


Burrows, S, Gurevych, I. e Stein, B. (2015) The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education.


Dasgupta, I., Guo, D., Stuhlmüller, A., Gershman, S. J., e Goodman, N. D. (2018). Evaluating compositionality in sentence embeddings. arXiv preprint arXiv:1802.04302.


Dong, F., Zhang, Y. e Yang, J.(2017). Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)(pp. 153-162)


Elliot, S. (2003). IntelliMetric: From here to validity. Automated essay scoring: A cross-disciplinary perspective, 71-86.


Fernández-Delgado, M., Cernadas, E., Barro, S. e Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems?. The Journal of Machine Learning Research, 15(1), 3133-3181.


Fleiss, J. L., e Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and psychological measurement,33(3),613-619.


Foltz, P. W., Laham, D. e Landauer, T. K.(1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, 1(2), 939-944.


Fonseca, E., Medeiros, I., Kamikawachi, D., e Bokan, A. (2018). Automatically grading brazilian student essays. In International Conference on Computational Processing of the Portuguese Language (pp. 170-179). Springer, Cham.


Haley, D. T. et al. (2007) Seeing the whole picture: evaluating automated assessment systems. ITALICS.


Hearst, M. A. (2000) The debate on automated essay grading. IEE Intelligeng Systems archive.


Lee, I. (2014). Teachers’ reflection on implementation of innovative feedback approaches in EFL writing. English Teaching, 69(1), 23-40.


Mohler, M. e Mihalcea, R. (2009) Text-to-text semantic similarity for automatic short answer grading. EACL’09 - Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.


Noorbehbahani, F. e Kardan, A. A. (2011) The automatic assessment of free text answers using a modified bleu algorithm. Computer & Education.


Page, E. B. (1966) The imminence of grading essay by computer. The Phi Delta Kappan.


Palma, D. e Atkinson, J. (2018) Coherence-Based Automatic Essay Assessment. IEEE Intelligent Systems, v. 33, n. 5, p. 26-36.


Rich, C. S.; Schneider, M. C. e D’brot, J. M. (2013) Applications of automated essay evaluation in West Virginia. In: Handbook of Automated Essay Evaluation. Routledge. p. 121-145.


Rodrigues, F. e Araújo, L. (2012) Automatic Assessment of Short Free Text Answers. In: CSEDU (2). p. 50-57.


Rudner, L. M., Garcia, V. e Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. The Journal of Technology, Learning and Assessment, 4(4)


Shermis, M. D., e Hamner, B. (2012). Contrasting state-of-the-art automated scoring of essays: Analysis. In Annual national council on measurement in education meeting (pp. 14-16).


Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929-1958.


Vajjalla, S. (2018) “Automated assessment of non-native learner essays: Investigating the role of linguistic features”. International Journal of Artificial Intelligence in Education, v. 28, n. 1, p. 79-105.


Wachsmuth, H., Stein, B., e Engels, G. (2011). Constructing efficient information extraction pipelines. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2237-2240). ACM.


Yang, W. (2012). A study of students’ perceptions and attitudes towards genre-based ESP writing instruction. The Asian ESP Journal, 8(3), 50-73.


Zupanc, K. e Bosnic, Z. (2015) Automated essay evaluation augmented with semantic coherence measures. IEEE International Conference on Data Mining (ICDM).


Zupanc, K. e Bosnic, Z. (2017). Automated essay evaluation with semantic analysis. Know.-Based Syst., 120(C):118 – 132. DOI: 10.1016/j.knosys.2017.01.006
Published
2020-11-24
SIROTHEAU CORRÊA NETO, Silverio; FAVERO, Elói Luiz; ALVES DOS SANTOS, João Carlos; FREITAS, Simone Negrão de; NASCIMENTO JÚNIOR, Marco Aurélio. Automatic evaluation of essays in Portuguese based on feature collection and machine learning. In: BRAZILIAN SYMPOSIUM ON COMPUTERS IN EDUCATION (SBIE), 31. , 2020, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 1162-1171. DOI: https://doi.org/10.5753/cbie.sbie.2020.1162.