Exploring Distinct Features for Automatic Short Answer Grading
Resumo
Automatic short answer grading is the study field that addresses the assessment of students’ answers to questions in natural language. The grading of the answers is generally seen as a typical classification supervised learning. To stimulate research in the field, two datasets were publicly released in the SemEval 2013 competition task “Student Response Analysis”. Since then, some works have been developed to improve the results. In this context, the goal of this work is to tackle such task by implementing lessons learned from the literature in an effective way and report results for both datasets and all of its scenarios. The proposed method obtained better results in most scenarios of the competition task and, therefore, higher overall scores when compared to recent works.
Referências
Aldabe, I., Lacalle, O. L., Maritxalar, M., and Lopez-Gazpio, I. (2015). Supervised Hierarchical Classification for Student Answer Scoring. Burrows, S., Gurevych, I., and Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1):60– 117.
Dzikovska, M., Nielsen, R., Brew, C., Leacock, C., Giampiccolo, D., Bentivogli, L., Clark, P., Dagan, I., and Dang, H. T. (2013). SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), 2(SemEval):263–274.
Dzikovska, M. O., Bental, D., Moore, J. D., Steinhauser, N. B., Campbell, G. E., Farrow, E., and Callaway, C. B. (2010). Intelligent tutoring with natural language support in the beetle ii system. In European Conference on Technology Enhanced Learning, pages 620–625. Springer. Dzikovska, M. O., Nielsen, R. D., and Brew, C. (2012). Towards Effective Tutorial Feedback for Explanation Questions: A Dataset and Baselines. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 200–210.
Heilman, M. and Madnani, N. (2013). ETS: Domain Adaptation and Stacking for Short Answer Scoring. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), 2(SemEval):275–279.
Jimenez, S., Becerra, C., and Gelbukh, A. (2013). SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), 2(SemEval):280–284.
Kumar, S., Chakrabarti, S., and Roy, S. (2017). Earth mover’s distance pooling over siamese LSTMs for Automatic short answer grading. International Joint Conference on Artificial Intelligence, pages 2046–2052.
Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., and Linn, M. C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2):215–233.
Magooda, A., Zahran, M. A., Rashwan, M., Raafat, H., and Fayek, M. B. (2016). Vector Based Techniques for Short Answer Grading. International Florida Artificial Intelligence Research Society Conference Ahmed, pages 238–243.
Mancera, S., Jimenez, S., and Gonzalez, F. A. (2015). ZETEMA: A web service for automatic short-answer questions grading. 2015 10th Computing Colombian Conference (10CCC), pages 504–508.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
Miller, G. A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41.
Mohler, M. and Mihalcea, R. (2009). Text-to-text semantic similarity for automatic short answer grading. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on - EACL ’09, pages 567–575.
Neelakantan, A., Shankar, J., Passos, A., and McCallum, A. (2015). Efficient nonparametric estimation of multiple embeddings per word in vector space. arXiv preprint arXiv:1504.06654.
Nielsen, R. D., Ward, W. H., Martin, J. H., and Palmer, M. (2008). Annotating students’ understanding of science concepts. In LREC. Ott, N., Ziai, R., Hahn, M., and Meurers, D. (2013). CoMeT: Integrating different levels of linguistic modeling for meaning assessment. Second Joint Conference on Lexical and Computational Semantics , and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, 2(SemEval):608–616.
Passero, G., Haendchen Filho, A., and Dazzi, R. (2016). Avaliação do uso de métodos baseados em lsa e wordnet para correção de quest˜oes discursivas. In Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação- SBIE), volume 27, page 1136.
Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
Riordan, B., Horbach, A., Cahill, A., Zesch, T., and Lee, C. M. (2017). Investigating neural architectures for short answer scoring. $Bea17, pages 159–168.
Roy, S., Bhatt, H. S., and Narahari, Y. (2016). An Iterative Transfer Learning Based Ensemble Technique for Automatic Short Answer Grading. 285:1622–1623.
Santos, J. C. A. d. et al. (2016). Avaliação automática de quest˜oes discursivas usando lsa. Universidade Federal do Pará.
Sultan, M. A., Salazar, C., and Sumner, T. (2016). Fast and Easy Short Answer Grading with High Accuracy. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1070–1075.
Vijaymeena, M. and Kavitha, K. (2016). A survey on similarity measures in text mining. Machine Learning and Applications: An International Journal, 3(2):19–28.
Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2):241–259.
Zhang, C., Liu, C., Zhang, X., and Almpanidis, G. (2017). An up-to-date comparison of state-of-the-art classification algorithms. Expert Systems with Applications, 82:128– 150.
Zhang, Y., Shah, R., and Chi, M. (2016). Deep Learning + Student Modeling + Clustering: a Recipe for Effective Automatic Answer Grading. Proceedings of the 9th International Conference on Educational Data Mining, pages 562–567.