Automatic Generation of Programming Questions Using LLM: An Experience Report
Abstract
This paper reports an experience of automatically generating 180 introductory programming questions using large language models (LLMs). The methodology was based on iterative cycles of prompt engineering, involving the use of structured templates, guided examples (few-shot prompting), and automated refinement (self-refinement). The approach aimed to ensure clarity, completeness, and pedagogical alignment in the generated questions. Results indicate that this type of strategy proved effective and replicable, contributing to scalable educational content production supported by artificial intelligence. Furthermore, several lessons learned are presented to empower educators and researchers in applying these techniques.
References
Carneiro, J., Aranha, E., and Santana, A. (2022). Aprendizado de domínio aplicada à educação matemática, da computação e engenharias: um mapeamento sistemático. In XXXIII Simpósio Brasileiro de Informática na Educação, Simpósio Brasileiro de Informática na Educação. Anais do XXXIII Simpósio Brasileiro de Informática na Educação.
Chan, W., An, A., and Davoudi, H. (2023). A case study on chatgpt question generation. In 2023 IEEE International Conference on Big Data (BigData), pages 1647–1656.
Doughty, J., Wan, Z., Bompelli, A., Qayum, J., Wang, T., Zhang, J., Zheng, Y., Doyle, A., Sridhar, P., Agarwal, A., Bogart, C., Keylor, E., Kultur, C., Savelka, J., and Sakr, M. (2024). A comparative study of ai-generated (gpt-4) and human-crafted mcqs in programming education. In 26th Australasian Computing Education Conference (ACE ’24), pages 114–123, New York, NY, USA. Association for Computing Machinery.
Faraby, S. A., Romadhony, A., and Adiwijaya (2024). Analysis of llms for educational question classification and generation. Computers and Education: Artificial Intelligence, 7.
Hevner, A., R, A., March, S., T, S., Park, Park, J., Ram, and Sudha (2004). Design science in information systems research. Management Information Systems Quarterly, 28:75–.
Kurdi, G., Leo, J., Parsia, B., et al. (2020). A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30:121–204.
Madaan, A., Lin, S., Liu, X., Yang, Y., Neubig, G., Le Bras, R., and Smith, N. A. (2023). Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.
Meißner, N., Speth, S., and Becker, S. (2024). Automated programming exercise generation in the era of large language models. In 36th International Conference on Software Engineering Education and Training (CSEET), pages 1–5, Würzburg, Germany.
Niu, Y. and Xue, H. (2023). Exercise generation and student cognitive ability research based on chatgpt and rasch model. IEEE Access, 11:116695–116705.
Thalheimer, W. (2003). The learning benefits of questions. Technical report, Work Learning Research. Tech. rep.
Wu, T., Jiang, E., Donsbach, A., Gray, J., Molina, A., Terry, M., and Cai, C. J. (2022). Promptchainer: Chaining large language model prompts through visual programming.
