Essay-BR: a Brazilian Corpus of Essays

Jeziel C. Marinho; Rafael T. Anchiêta; Raimundo S. Moura

doi:10.5753/dsw.2021.17414

Jeziel C. Marinho Universidade Federal do Piauí
Rafael T. Anchiêta Instituto Federal do Piauí
Raimundo S. Moura Universidade Federal do Piauí

DOI: https://doi.org/10.5753/dsw.2021.17414

Resumo

Automatic Essay Scoring (AES) is the computer technology that evaluates and scores the written essays, aiming to provide computational models to grade essays automatically or with minimal human involvement. While there are several AES studies in a variety of languages, few of them are focused on the Portuguese language. The main reason is the lack of a corpus with manually graded essays. We create a large corpus with several essays written by Brazilian high school students on an online platform in order to bridge this gap. All of the essays are argumentative and were scored across five competences by experts. Moreover, we conducted an experiment on the created corpus and showed challenges posed by the Portuguese language. Our corpus is publicly available at https://github.com/rafaelanchieta/essay.

Palavras-chave: Natural Language Processing, Automatic Essay Scoring, Essay Corpus

Referências

Amorim, E., Canc¸ado, M., and Veloso, A. (2018). Automated essay scoring in the presence of biased ratings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 229–237, New Orleans, Louisiana. Association for Computational Linguistics.

Amorim, E. and Veloso, A. (2017). A multi-aspect analysis of automatic essay scoring for Brazilian Portuguese. In Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 94–102, Valencia, Spain. Association for Computational Linguistics.

Bazelato, B. and Amorim, E. (2013). A bayesian classifier to automatic correction of portuguese essays. In XVIII Conferência Internacional sobre Informática na Educação pages 779–782, Porto Alegre, Brazil. Nuevas Ideas en Informática Educativa.

Beigman Klebanov, B., Flor, M., and Gyawali, B. (2016). Topicality-based indices for essay scoring. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, pages 63–72, San Diego, CA. Association for Computational Linguistics.

Beigman Klebanov, B. and Madnani, N. (2020). Automated evaluation of writing – 50 years and counting. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7796–7810, Online. Association for Computational Linguistics.

Bird, S. (2006). NLTK: The Natural Language Toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 69–72, Sydney, Australia. Association for Computational Linguistics.

Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., and Chodorow, M. (2013). Toefl11: A corpus of non-native english. ETS Research Report Series, 2013(2):i–15.

Cohen, J. (1968). Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4):213–220.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.

Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment, 5(1):1–36.

Dong, F., Zhang, Y., and Yang, J. (2017). Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 153–162, Vancouver, Canada. Association for Computational Linguistics.

Farra, N., Somasundaran, S., and Burstein, J. (2015). Scoring persuasive essays using opinions and their targets. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 64–74, Denver, Colorado. Association for Computational Linguistics.

Fonseca, E., Medeiros, I., Kamikawachi, D., and Bokan, A. (2018). Automatically grading brazilian student essays. In Proceedings of the 13th International Conference on Computational Processing of the Portuguese Language, pages 170–179, Canela, Brazil. Springer International Publishing.

Ke, Z. and Ng, V. (2019). Automated essay scoring: a survey of the state of the art. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 6300–6308, Macao, China. AAAI Press.

Mayfield, E. and Black, A. W. (2020). Should you fine-tune BERT for automated essay scoring? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 151–162, Online. Association for Computational Linguistics.

Nguyen, H. V. and Litman, D. J. (2018). Argument mining for improving the automated scoring of persuasive essays. In Thirty-Second AAAI Conference on Artificial Intelligence, pages 5892–5899, Louisiana, USA. AAAI Press.

Page, E. B. (1966). The imminence of... grading essays by computer. The Phi Delta Kappan, 47(5):238–243.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau,D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Scarton, C., Gasperin, C., and Aluisio, S. (2010). Revisiting the readability assessment of texts in portuguese. In Proceedings of the 12th Ibero-American Conference on Artificial Intelligence, pages 306–315, Bahía Blanca, Argentina. Springer.

Shermis, M. D. and Barrera, F. D. (2002). Exit assessments: Evaluating writing ability through automated essay scoring. page 31.

Shermis, M. D. and Burstein, J. (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.

Stab, C. and Gurevych, I. (2014). Annotating argument components and relations in persuasive essays. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 1501–1510, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.

Sylviane Granger, Estelle Dagneaux, F. M. and Paquot, M. (2009). International Corpus of Learner English (Version 2). UCL Presses de Louvain.

Taghipour, K. and Ng, H. T. (2016). A neural approach to automated essay scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1882–1891, Austin, Texas. Association for Computational Linguistics.

Vajjala, S. (2018). Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28(1):79–105.

Williamson, D. M. (2009). A framework for Implementing Automated Scoring. In Annual Meeting of the American Educational Research Association and the National Council on Measurement in Education, page 39, San Diego, CA.

Yannakoudakis, H., Briscoe, T., and Medlock, B. (2011). A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 180–189, Portland, Oregon, USA. Association for Computational Linguistics.

Yannakoudakis, H. and Cummins, R. (2015). Evaluating the performance of automated text scoring systems. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 213–223, Denver, Colorado. Association for Computational Linguistics.