Automatic evaluation of short textual answers by n-gram similarities: refinements by linear regression
Abstract
In distance education, the need for intelligent virtual environments has been growing, where one of the components is a system of automatic assessment for conceptual open-ended questions. We work with answers to entrance examination questions using n-grams text-like similarity techniques and the linear regression method. The accuracy of the system was contrasted with that of the human evaluators, which resulted in 0.82 against 0.94, Biology test, and 0.86 against 0.85 Geography test. This study shows that this technology is reaching maturity to be used with great advantages in these virtual teaching environments: low cost, instant feedback, frees the teacher from the work of correction and attends large classes.
References
Burstein, J. (1998). et al. Automated scoring using a hybrid feature identification technique. In: Proceedings of the 17th International Conference on Computational Linguistics - Volume 1. Association for Computational Linguistics, p. 206-210.
Cheniti-Belcadhi, L. et al. (2004). A Generic Framework for Assessment in Adaptive Educational Hypermedia. In: ICWI. p. 397-404.
Santos, J. C.; Favero, E. L. (2015). Practical use of a latent semantic analysis (LSA) model for automatic evaluation of written answers. Journal of the Brazilian Computer Society, v. 21, n. 1, p. 21.
Foltz, P. W., Laham, D. e Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, v. 1, n. 2, p. 939-944.
Gomaa, W. H. e Fahmy, A. A. (2012). Short answer grading using string similarity and corpus-based similarity. International Journal of Advanced Computer Science and Applications (IJACSA), v. 3, n. 11.
Gomaa, W. H. e Fahmy, A. A. (2014). Automatic scoring for answers to Arabic test questions. Computer Speech & Language, v. 28, n. 4, p. 833-857.
Gült, C. (2007). e-Examiner: towards a fully-automatic knowledge assessment tool applicable in adaptive e-learning systems. In: Proceedings of the 2nd International Conference on Interactive Mobile and Computer Aided Learning. p. 1-10.
Haley, D. T. et al. (2007). Seeing the whole picture: evaluating automated assessment systems. Innovation in Teaching and Learning in Information and Computer Sciences, v. 6, n. 4, p. 203-224.
Hatzivassiloglou, V., Klavans, J. L. e Eskin, E. (1999). Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In: 1999 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora.
Hearst, M. A. (2000). The debate on automated essay grading. IEEE Intelligent Systems and their Applications, v. 15, n. 5, p. 22-37.
Landauer, T. K., Foltz, P. W. e Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, v. 25, n. 2-3, p. 259-284.
Leacock, C. e Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, v. 37, n. 4, p. 389-405.
Vantage Learning (2000). A study of expert scoring and IntelliMetric scoring accuracy for dimensional scoring of Grade 11 student writing responses (RB-397). Newtown, PA: Vantage Learning.
Mitchell, T., Russell, T., Broomhead, P. e Aldridge, N. (2002). Towards robust computerised marking of free-text responses.
Mohler, M. e Mihalcea, R. (2009). Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 567–575. Association for Computational Linguistics.
Nakov, P., Valchanova, E. e Angekova, G. (2004). Towards deeper understanding of the latent semantic analysis performance. Amsterdam Studies in the Theory and History of Linguistic Science, p. 297.
Noorbehbahani, F. e Kardan, A. (2011). The automatic assessment of free text answers using a modified BLEU algorithm. Computers & Education, v. 56, n. 2, p. 337-345.
Page, E. B. (1966). The imminence of grading essay by computer. The Phi Delta Kappan.
Pérez, D. et al. (2005). About the effects of combining latent semantic analysis with natural language processing techniques for free-text assessment. Revista Signos, v. 38, n. 59, p. 325-343.
Pribadi, F. S. et al. (2017). Automatic short answer scoring using words overlapping methods. In: AIP Conference Proceedings. AIP Publishing. p. 020042.
Rodrigues, F. e Araújo, L. (2012). Automatic Assessment of Short Free Text Answers. In: CSEDU (2). p. 50-57.
Salton, G., Wong, A. e Yang, C. (1975). A vector space model for automatic indexing. Communications of the ACM, v. 18, n. 11, p. 613-620.
Zupanc, K. e Bosnic, Z. (2016). Advances in the field of automated essay evaluation. Informatica, v. 39, n. 4.
