Avaliação automática de respostas textuais curtas por similaridades de n-gramas: refinamentos por regressão linear

Silvério Sirotheau; João Carlos A. dos Santos; Eloi L. Favero

doi:10.5753/cbie.sbie.2018.1433

Silvério Sirotheau Universidade Federal do Pará (UFPA)
João Carlos A. dos Santos Universidade Federal do Pará (UFPA)
Eloi L. Favero Universidade Federal do Pará (UFPA)

DOI: https://doi.org/10.5753/cbie.sbie.2018.1433

Resumo

No ensino a distância cresce a necessidade de ambientes virtuais inteligentes, onde um dos componentes é um sistema de avaliação automática de questões conceituais discursivas. Trabalhamos com respostas de questões do vestibular utilizando técnicas de similaridade de textos baseadas em n-gramas e o método de regressão linear. A acurácia do sistema foi contrastada com a dos avaliadores humanos, que resultou em 0.82 contra 0.94, prova Biologia, e 0.86 contra 0.85 prova Geografia. Este estudo mostra que esta tecnologia está alcançando maturidade para ser utilizadas com grandes vantagens nestes ambientes virtuais de ensino: baixo custo, feedback imediato, libera o professor do trabalho de correção e atende grandes turmas.

Palavras-chave: avaliação automática, respostas textuais, similaridade de textos, n-gramas, regressão linear

Referências

Burrows, S.; Gurevych, I.; Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, v. 25, n. 1, p. 60-117.

Burstein, J. (1998). et al. Automated scoring using a hybrid feature identification technique. In: Proceedings of the 17th International Conference on Computational Linguistics - Volume 1. Association for Computational Linguistics, p. 206-210.

Cheniti-Belcadhi, L. et al. (2004). A Generic Framework for Assessment in Adaptive Educational Hypermedia. In: ICWI. p. 397-404.

Santos, J. C.; Favero, E. L. (2015). Practical use of a latent semantic analysis (LSA) model for automatic evaluation of written answers. Journal of the Brazilian Computer Society, v. 21, n. 1, p. 21.

Foltz, P. W., Laham, D. e Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, v. 1, n. 2, p. 939-944.

Gomaa, W. H. e Fahmy, A. A. (2012). Short answer grading using string similarity and corpus-based similarity. International Journal of Advanced Computer Science and Applications (IJACSA), v. 3, n. 11.

Gomaa, W. H. e Fahmy, A. A. (2014). Automatic scoring for answers to Arabic test questions. Computer Speech & Language, v. 28, n. 4, p. 833-857.

Gült, C. (2007). e-Examiner: towards a fully-automatic knowledge assessment tool applicable in adaptive e-learning systems. In: Proceedings of the 2nd International Conference on Interactive Mobile and Computer Aided Learning. p. 1-10.

Haley, D. T. et al. (2007). Seeing the whole picture: evaluating automated assessment systems. Innovation in Teaching and Learning in Information and Computer Sciences, v. 6, n. 4, p. 203-224.

Hatzivassiloglou, V., Klavans, J. L. e Eskin, E. (1999). Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In: 1999 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora.

Hearst, M. A. (2000). The debate on automated essay grading. IEEE Intelligent Systems and their Applications, v. 15, n. 5, p. 22-37.

Landauer, T. K., Foltz, P. W. e Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, v. 25, n. 2-3, p. 259-284.

Leacock, C. e Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, v. 37, n. 4, p. 389-405.

Vantage Learning (2000). A study of expert scoring and IntelliMetric scoring accuracy for dimensional scoring of Grade 11 student writing responses (RB-397). Newtown, PA: Vantage Learning.

Mitchell, T., Russell, T., Broomhead, P. e Aldridge, N. (2002). Towards robust computerised marking of free-text responses.

Mohler, M. e Mihalcea, R. (2009). Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 567–575. Association for Computational Linguistics.

Nakov, P., Valchanova, E. e Angekova, G. (2004). Towards deeper understanding of the latent semantic analysis performance. Amsterdam Studies in the Theory and History of Linguistic Science, p. 297.

Noorbehbahani, F. e Kardan, A. (2011). The automatic assessment of free text answers using a modified BLEU algorithm. Computers & Education, v. 56, n. 2, p. 337-345.

Page, E. B. (1966). The imminence of grading essay by computer. The Phi Delta Kappan.

Pérez, D. et al. (2005). About the effects of combining latent semantic analysis with natural language processing techniques for free-text assessment. Revista Signos, v. 38, n. 59, p. 325-343.

Pribadi, F. S. et al. (2017). Automatic short answer scoring using words overlapping methods. In: AIP Conference Proceedings. AIP Publishing. p. 020042.

Rodrigues, F. e Araújo, L. (2012). Automatic Assessment of Short Free Text Answers. In: CSEDU (2). p. 50-57.

Salton, G., Wong, A. e Yang, C. (1975). A vector space model for automatic indexing. Communications of the ACM, v. 18, n. 11, p. 613-620.

Zupanc, K. e Bosnic, Z. (2016). Advances in the field of automated essay evaluation. Informatica, v. 39, n. 4.