A Zero-Shot Prompting Approach for Automated Feedback Generation on ENEM Essays

  • Rafael T. Anchiêta IFMA
  • Anthony I. M. Luz IFPI
  • Shara L. C. Lopes IFPI
  • Raimundo S. Moura UFPI

Resumo


Automated Essay Scoring (AES) has made significant progress in evaluating written texts, but the generation of constructive feedback for essays, particularly in Portuguese, remains underexplored. This paper proposes a zero-shot prompting approach to automatically generate feedback for ENEM essays, an essential component of formative assessments.We evaluate several Large Language Models (LLMs) – Gemini 2.0 Flash, Sabiá 3, Llama 3-8B, and Qwen 3-8B – to provide feedback on five competencies defined by the ENEM scoring rubric. Using the AES-ENEM dataset, we prompt the models to generate feedback for each competency, comparing their semantic similarity with human feedback using the BERTScore metric. Additionally, a linguist with expertise in ENEM essays evaluates the feedback for its constructiveness and informativeness. Our results demonstrate that while the models perform similarly in BERTScore evaluations, the Qwen model produces the most informative and constructive feedback. This work contributes to the development of automated systems that not only grade essays but also assist in improving student writing skills, potentially reducing teacher workload in large-scale assessments.
Palavras-chave: essays, feedback, language model, zero-shot

Referências

Hugo Abonizio, Thales Sales Almeida, Thiago Laitz, Roseval Malaquias Junior, Giovana Kerche Bonás, Rodrigo Nogueira, and Ramon Pires. 2025. Sabiá-3. Technical Report. Maritaca AI.

Rafael T Anchiêta, Rogério F de Sousa, and Raimundo S Moura. 2024. A Robustness Analysis of Automated Essay Scoring Methods. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana. SBC, Belém, Brazil, 75–80.

Rogério F. de Sousa, Jeziel C. Marinho, Francisco A. R. Neto, Rafael T. Anchiêta, and Raimundo S. Moura. 2024. PiLN at PROPOR: A BERT-Based Strategy for Grading Narrative Essays. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2. Association for Computational Linguistics, Santiago de Compostela, Galicia/Spain, 10–13.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.

Anton Havnes, Kari Smith, Olga Dysthe, and Kristine Ludvigsen. 2012. Formative assessment and feedback: Making learning visible. Studies in educational evaluation 38, 1 (2012), 21–27.

Scott Hellman, William Murray, Adam Wiemerslage, Mark Rosenstein, Peter Foltz, Lee Becker, and Marcia Derr. 2020. Multiple Instance Learning for Content Feedback Localization without Annotation. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, Seattle, WA, USA→Online, 30–40.

Zixuan Ke and Vincent Ng. 2019. Automated essay scoring: a survey of the state of the art. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, Macao, China, 6300–6308.

Vivekanandan Kumar and David Boulanger. 2020. Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value. Frontiers in Education 5 (2020), 22.

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys 55, 9 (2023), 1–35.

Yuanchao Liu, Jiawei Han, Alexander Sboev, and Ilya Makarov. 2024. Geef: a neural network model for automatic essay feedback generation by integrating writing skills assessment. Expert Systems with Applications 245 (2024), 123043.

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Long Beach, CA, USA, 4765–4774.

Jeziel C. Marinho, Rafael T. Anchiêta, and Raimundo S. Moura. 2021. Essay-BR: a Brazilian Corpus of Essays. In XXXIV Simpósio Brasileiro de Banco de Dados: Dataset Showcase Workshop, SBBD 2021. SBC, Online, 53–64.

Jeziel C. Marinho, Rafael T. Anchiêta, and Raimundo S. Moura. 2022. Essay-BR: a Brazilian Corpus to Automatic Essay Scoring Task. Journal of Information and Data Management 13, 1 (2022), 65–76.

Jeziel C. Marinho, Fábio C., Rafael T. Anchiêta, and Raimundo S. Moura. 2022. Automated Essay Scoring: An approach based on ENEM competencies. In Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional. SBC, Campinas, Brazil, 49–60.

Rafael Ferreira Mello, Hilário Oliveira, Moésio Wenceslau, Hyan Batista, Thiago Cordeiro, Ig Ibert Bittencourt, and Seiji Isotanif. 2024. PROPOR‘24 Competition on Automatic Essay Scoring of Portuguese Narrative Essays. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2. Association for Computational Linguistics, Santiago de Compostela, Galicia/Spain, 1–5.

Haile Misgna, Byung-Won On, Ingyu Lee, and Gyu Sang Choi. 2025. A survey on deep learning-based automated essay scoring and feedback generation. Artificial Intelligence Review 58, 2 (2025), 1–40.

Hilário Oliveira, Rafael Ferreira Mello, Bruno Alexandre Barreiros Rosa, Mladen Rakovic, Pericles Miranda, Thiago Cordeiro, Seiji Isotani, Ig Bittencourt, and Dragan Gasevic. 2023. Towards explainable prediction of essay cohesion in portuguese and english. In Proceedings of the 13th International Learning Analytics and Knowledge Conference. Association for Computing Machinery, Arlington TX USA, 509–519.

Hilário Oliveira, Rafael Ferreira Mello, Péricles Miranda, Hyan Batista, Moésio Wenceslau da Silva Filho, Thiago Cordeiro, Ig Ibert Bittencourt, and Seiji Isotani. 2025. A benchmark dataset of narrative student essays with multicompetency grades for automatic essay scoring in Brazilian Portuguese. Data in Brief 60 (2025), 111526.

Leanne Owen. 2016. The Impact of Feedback as Formative Assessment on Student Performance. International Journal of Teaching and Learning in Higher Education 28, 2 (2016), 168–175.

Ellis B Page. 1966. The imminence of... grading essays by computer. The Phi Delta Kappan 47, 5 (1966), 238–243.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318.

Melissa M Patchan, Christian D Schunn, and Richard J Correnti. 2016. The nature of feedback: How peer feedback features affect students’ implementation rate and quality of revisions. Journal of Educational Psychology 108, 8 (2016), 1098.

Mark D Shermis and Felicia D Barrera. 2002. Exit Assessments: Evaluating Writing Ability through Automated Essay Scoring. In Annual Meeting of the American Educational Research Association. ERIC, New Orleans, LA, 1–30.

Joyce M Silva, Rafael T Anchiêta, Rogério F de Sousa, and Raimundo S Moura. 2024. Investigating Methods to Detect Off-Topic Essays. In Proceedings of the 34th Brazilian Conference on Intelligent Systems. Springer, Belém, Brazil, 346–357.

Igor Cataneo Silveira, André Barbosa, Daniel Silva Lopes da Costa, and Denis Deratani Mauá. 2024. Investigating Universal Adversarial Attacks Against Transformers-Based Automatic Essay Scoring Systems. In Proceedings of the 34th Brazilian Conference on Intelligent Systems. Springer, Belém, Brazil, 169–183.

Igor Cataneo Silveira, André Barbosa, and Denis Deratani Mauá. 2024. A New Benchmark for Automatic Essay Scoring in Portuguese. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1. Association for Computational Linguistics, Santiago de Compostela, Galicia/Spain, 228–237.

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Lianghao Deng, Mei Li, Mingfeng Xue, Mingze Li, Pei Zhang, Peng Wang, Qin Zhu, Rui Men, Ruize Gao, Shixuan Liu, Shuang Luo, Tianhao Li, Tianyi Tang,Wenbiao Yin, Xingzhang Ren, Xinyu Wang, Xinyu Zhang, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yinger Zhang, Yu Wan, Yuqiong Liu, Zekun Wang, Zeyu Cui, Zhenru Zhang, Zhipeng Zhou, and Zihan Qiu. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] [link]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. In 8th International Conference on Learning Representations. OpenReview.net, Online.
Publicado
10/11/2025
ANCHIÊTA, Rafael T.; LUZ, Anthony I. M.; LOPES, Shara L. C.; MOURA, Raimundo S.. A Zero-Shot Prompting Approach for Automated Feedback Generation on ENEM Essays. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 511-515. DOI: https://doi.org/10.5753/webmedia.2025.15377.

Artigos mais lidos do(s) mesmo(s) autor(es)