Beyond Correctness: A Competency-Driven Framework for Designing Autograder Test Suites

Hugo F. Guarilha; Nicolas Arruda; Carla Delgado; Laura de O. F. Moraes

doi:10.5753/wei.2026.21827

Hugo F. Guarilha UFRJ
Nicolas Arruda UFRJ
Carla Delgado UFRJ
Laura de O. F. Moraes UNIRIO

DOI: https://doi.org/10.5753/wei.2026.21827

Resumo

Automated programming autograders are essential for providing immediate feedback in programming education. However, conventional autograders are often limited to evaluating functional correctness through pass/fail tests. This article introduces a framework for designing autograder test suites where a single programming problem is deconstructed into multiple competencies. To automatically assign a grade to a student’s activity, the tool allows for defining weights for each test case, supporting the instructor in designing a test suite aligned with the learning objectives related to the predefined competencies. An experiment comparing this framework with traditional paper-based evaluations revealed a 97% reduction in grading time (r = 0.70 correlation), while effectively shifting the instructor’s role from grader to assessment designer.

Referências

Arifi, S. M., Abdellah, I. N., Zahi, A., and Benabbou, R. (2015). Automatic program assessment using static and dynamic analysis. In 2015 Third World Conference on Complex Systems (WCCS), pages 1–6.

Barbosa, A. d. A., Costa, E. d. B., and Brito, P. H. (2023). Juízes online são suficientes ou precisamos de um var? In Anais do III Simpósio Brasileiro de Educação em Computação (EDUCOMP), pages 386–394, Porto Alegre. Sociedade Brasileira de Computação.

Combéfis, S. (2022). Automated code assessment for education: Review, classification and perspectives on techniques and tools. Software, 1:3–30.

Crisp, G. (2009). Assessment in Higher Education: Professional Practice and Future Challenges. Routledge.

Cruz, L. S., Santos, J. A. M., Coutinho, L. d. A. H., and Salvador, L. N. (2025). A reference model for presentation of studies in competency-based education in engineering and computing. In Anais do Simpósio Brasileiro de Educação em Computação (EDUCOMP), pages 59–71, Porto Alegre. Sociedade Brasileira de Computação.

Folloni Guarilha, H. (2025). Implementation of synchronous assessments on the machine teaching platform. Universidade Federal do Rio de Janeiro, Trabalho de Conclusão de Curso.

Hettiarachchi, E., Huertas, M. A., and Mor, E. (2013). Skill and knowledge e-assessment: A review of the state of the art. IN3 Working Paper Series, (2013):1–22.

Langove, S. A. and Khan, A. (2024). Automated grading and feedback systems: Reducing teacher workload and improving student performance. Journal of Asian Development Studies, 13(4):202–212.

McAlpine, M. (2002). Principles of Assessment. CETIS. Available online.

Messer, M., Brown, N. C. C., Kölling, M., and Shi, M. (2024). Automated grading and feedback tools for programming education: A systematic review. ACM Transactions on Computing Education, 24(10):1–43.

Messer, M., Brown, N. C. C., Kölling, M., and Shi, M. (2025). How consistent are humans when grading programming assignments? ACM Transactions on Computing Education, 25(4).

Moraes, L. O., Pedreira, C. E., Delgado, C. A. D. M., and Freire, J. P. (2021). Supporting Decisions Using Educational Data Analysis. In Anais Estendidos do Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia), pages 99–102. SBC.

Paiva, J. C., Figueira, A., and Leal, J. P. (2023). Bibliometric analysis of automated assessment in programming education: A deeper insight into feedback. Electronics, 12:2254.

Porfirio, A., Pereira, R., and Eleandro, M. (2021). A-learn evid: A method for identifying evidence of computer programming skills through automatic source code assessment. Revista Brasileira de Informática na Computação (RBIE).

Sidhu, G., Srinivasan, S., and Muhammad, N. (2021). Challenge-based and competencybased assessments in an undergraduate programming course. International Journal of Emerging Technologies in Learning (iJET), 16(23):17–28.