LLMs in Programming Education: Strategies for Assessment and Formative Feedback

  • Francisco Genivan Silva UFRN / IFRN
  • Eduardo H. S. Aranha UFRN

Abstract


Programming education faces challenges such as concept assimilation difficulties and the workload on instructors for grading exercises. Large Language Models (LLMs) emerge as an alternative for providing immediate and adaptive feedback. This research investigates the feasibility of LLMs in code assessment and formative feedback, analyzing pedagogical and technical aspects. Experimental studies evaluate the effectiveness of this approach, focusing on prompt engineering and computational optimization. The goal is to develop guidelines for efficiently implementing LLMs in programming education, ensuring scalability and pedagogical alignment.

References

Al-Hossami, E., Bunescu, R., Smith, J. and Teehan, R. (2024). Can Language Models Employ the Socratic Method? Experiments with Code Debugging. . [link].

Bengesi, S., El-Sayed, H., Sarker, M. K., et al. (2024). Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers. IEEE Access, v. 12, p. 69812–69837.

Brown, T. B., Mann, B., Ryder, N., et al. (22 jul 2020). Language Models are Few-Shot Learners. . arXiv. [link], [accessed on Feb 9].

Hou, X., Wu, Z., Wang, X. and Ericson, B. J. (2024). CodeTailor: LLM-Powered Personalized Parsons Puzzles for Engaging Support While Learning Programming. . [link].

Joshi, I., Budhiraja, R., Dev, H., et al. (2024). ChatGPT in the Classroom: An Analysis of Its Strengths and Weaknesses for Solving Undergraduate Computer Science Questions. . [link].

Jukiewicz, M. (jun 2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, v. 52, p. 101522.

Kasneci, E., Sessler, K., Küchemann, S., et al. (apr 2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, v. 103, p. 102274.

Mcgowan, A., Anderson, N. and Smith, C. (2024). The use of ChatGPT to generate Summative Feedback in Programming Assessments that is Consistent, Prompt, without Bias and Scalable. . [link].

Rahman, M. M. and Watanobe, Y. (2023). ChatGPT for Education and Research: Opportunities, Threats, and Strategies. Applied Sciences (Switzerland), v. 13, n. 9.

Smolić, E., Pavelić, M., Boras, B., Mekterović, I. and Jagušt, T. (2024). LLM Generative AI and Students’ Exam Code Evaluation: Qualitative and Quantitative Analysis. [link].

Zhang, J., Cambronero, J. P., Gulwani, S., et al. (2024). PyDex: Repairing Bugs in Introductory Python Assignments using LLMs. Proceedings of the ACM on Programming Languages, v. 8, n. OOPSLA1.
Published
2025-04-07
SILVA, Francisco Genivan; ARANHA, Eduardo H. S.. LLMs in Programming Education: Strategies for Assessment and Formative Feedback. In: WORKSHOP ON THESES AND DISSERTATIONS IN COMPUTING EDUCATION - BRAZILIAN SYMPOSIUM ON COMPUTING EDUCATION (EDUCOMP), 5. , 2025, Juiz de Fora/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 81-87. ISSN 3086-0741. DOI: https://doi.org/10.5753/educomp_estendido.2025.6614.