Development and Evaluation of an Intelligent Tutor for Programming Learning Based on Extensive Language Models
Abstract
The emergent behavior of automatic programming following the popularization of Generative Artificial Intelligence has raised uncertainty about the future of programming and its teaching. This doctoral work proposes the design and evaluation of an architecture for the development of Intelligent Tutoring Systems for programming learning, integrating Large Language Models to provide a personalized user experience. The architecture is developed using a design-based research methodology, assessing its effect on cognitive engagement and learning through formative prototype evaluations and a summative evaluation conducted via an intervention study.
Keywords:
Software Engineering, Large Language Models, Intelligent Tutoring Systems, Generative Artificial Intelligence, Programming Learning, Learning Personalization
References
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S.,... & Liang, P. (2021). “On the opportunities and risks of foundation models”. arXiv preprint. DOI: 10.48550/arXiv.2108.07258
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P.,... & Amodei, D. (2020). “Language models are few-shot learners”. Advances in neural information processing systems, 33, 1877-1901. DOI: 10.48550/arXiv.2005.14165
Denny, P., Prather, J., Becker, B. A., Finnie-Ansley, J., Hellas, A., Leinonen, J.,... & Sarsa, S. (2024). “Computing education in the era of generative AI”. Communications of the ACM, 67(2), 56-67. DOI: 10.1145/3624720
Du Plooy, E., Casteleijn, D., & Franzsen, D. (2024). “Personalized adaptive learning in higher education: a scoping review of key characteristics and impact on academic performance and engagement”. Heliyon, 10(21), e39630. DOI: 10.1016/j.heliyon.2024.e39630
Fernández, L. R., Mena, A. L. F., Magaña, M. P. T., Magaña, M. A. R., & Fernández, M. A. R. (2024). “Inteligencia artificial en la educación: Modelo de lenguaje de gran tamaño (LLM) como recurso educativo”. Revista IPSUMTEC, 7(2), 157-164. DOI: 10.61117/ipsumtec.v7i2.321
Gao, L., Lu, J., Shao, Z., Lin, Z., Yue, S., Ieong, C.,... & Chen, S. (2024). “Fine-tuned large language model for visualization system: A study on self-regulated learning in education”. IEEE Transactions on Visualization and Computer Graphics. DOI: 10.1109/TVCG.2024.3456145
Goodfellow, I., Bengio, Y., & Courville, A. (2016). “Deep learning”. MIT Press.
Halverson, L. R., & Graham, C. R. (2019). “Learner Engagement in Blended Learning Environments: A Conceptual Framework”. Online Learning, 23, 145-178. DOI: 10.24059/olj.v23i2.1481
Johannesson, P., & Perjons, E. (2021). “An Introduction to Design Science”. In Springer eBooks. DOI: 10.1007/978-3-030-78132-3
Khan, H., Gul, R., & Zeb, M. (2023). “The Effect of Students’ Cognitive and Emotional Engagement on Students’ Academic Success and Academic Productivity”. Journal Of Social Sciences Review, 3(1), 322-334. DOI: 10.54183/jssr.v3i1.141
Lange, C. (2021). “The relationship between e-learning personalization and cognitive load”. Open Learning the Journal of Open Distance And e-Learning, 38(3), 228-242. DOI: 10.1080/02680513.2021.2019577
Levchuk, O. (2024). Diseño y evaluación de un tutor inteligente basado en Inteligencia Artificial Generativa para la adquisición de habilidades de programación. Tesis de Maestría en Ciencias. CICESE, Baja California, México. 92 pp.
Levchuk, O., Sánchez, C., Pacheco, N., López, I., & Favela, J. (2024). “Interaction Design (IxD) of an Intelligent Tutor for Programming Learning Based on LLM”. Avances en Interacción Humano-Computadora, 9(1), 1–10. DOI: 10.47756/aihc.y9i1.137
Liu, Z., He, X., Liu, L., Liu, T., & Zhai, X. (2023). “Context matters: A strategy to pre-train language model for science education”. In International Conference on Artificial Intelligence in Education, 666-674. Cham: Springer Nature Switzerland. DOI: 10.1007/978-3-031-36336-8_103
Qureshi, B. (2023). “Exploring the use of chatgpt as a tool for learning and assessment in undergraduate computer science curriculum: Opportunities and challenges”. ArXiv preprint. DOI: 10.48550/arXiv.2304.11214
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. & Sutskever, I. (2019). “Language Models are Unsupervised Multitask Learners”. OpenAI.
Rahman, M. M., & Watanobe, Y. (2023). “ChatGPT for Education and Research: Opportunities, Threats, and Strategies”. Applied Sciences, 13(9), 5783. DOI: 10.3390/app13095783
Scherer, R., Siddiq, F., & Viveros, B. S. (2020). “A meta-analysis of teaching and learning computer programming: Effective instructional approaches and conditions”. Computers In Human Behavior, 109, 106349. DOI: 10.1016/j.chb.2020.106349
Schmucker, R., Xia, M., Azaria, A., & Mitchell, T. (2023). “Ruffle&riley: Towards the automated induction of conversational tutoring systems”. ArXiv preprint. DOI: 10.48550/arXiv.2310.01420
Singh, D., & Rajendran, R. (2024). “Cognitive engagement as a predictor of learning gain in Python programming”. Smart Learning Environments, 11(1). DOI: 10.1186/s40561-024-00330-9
Sonkar, S., Ni, K., Chaudhary, S., & Baraniuk, R. G. (2024). “Pedagogical alignment of large language models”. arXiv preprint. DOI: 10.48550/arXiv.2402.05000
Tamkin, A., Liu, K., Valle, R., & Clark, J. (2025). “Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations”. Anthropic. assets.anthropic.com/m/2e23255f1e84ca97/original/Economic_Tasks_AI_Paper.pdf
Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). “Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models”. CHI EA '22: CHI Conference on Human Factors in Computing Systems Extended Abstracts, Article 332, 1–7. DOI: 10.1145/3491101.3519665
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). “Attention is all you need”. Advances in Neural Information Processing Systems, 30, 5998–6008. DOI: 10.48550/arXiv.1706.03762
Zhai, X., & Wiebe, E. (2023). “Technology-based innovative assessment”. In Classroom-Based STEM Assessment: Contemporary Issues and Perspectives, 99–125.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P.,... & Amodei, D. (2020). “Language models are few-shot learners”. Advances in neural information processing systems, 33, 1877-1901. DOI: 10.48550/arXiv.2005.14165
Denny, P., Prather, J., Becker, B. A., Finnie-Ansley, J., Hellas, A., Leinonen, J.,... & Sarsa, S. (2024). “Computing education in the era of generative AI”. Communications of the ACM, 67(2), 56-67. DOI: 10.1145/3624720
Du Plooy, E., Casteleijn, D., & Franzsen, D. (2024). “Personalized adaptive learning in higher education: a scoping review of key characteristics and impact on academic performance and engagement”. Heliyon, 10(21), e39630. DOI: 10.1016/j.heliyon.2024.e39630
Fernández, L. R., Mena, A. L. F., Magaña, M. P. T., Magaña, M. A. R., & Fernández, M. A. R. (2024). “Inteligencia artificial en la educación: Modelo de lenguaje de gran tamaño (LLM) como recurso educativo”. Revista IPSUMTEC, 7(2), 157-164. DOI: 10.61117/ipsumtec.v7i2.321
Gao, L., Lu, J., Shao, Z., Lin, Z., Yue, S., Ieong, C.,... & Chen, S. (2024). “Fine-tuned large language model for visualization system: A study on self-regulated learning in education”. IEEE Transactions on Visualization and Computer Graphics. DOI: 10.1109/TVCG.2024.3456145
Goodfellow, I., Bengio, Y., & Courville, A. (2016). “Deep learning”. MIT Press.
Halverson, L. R., & Graham, C. R. (2019). “Learner Engagement in Blended Learning Environments: A Conceptual Framework”. Online Learning, 23, 145-178. DOI: 10.24059/olj.v23i2.1481
Johannesson, P., & Perjons, E. (2021). “An Introduction to Design Science”. In Springer eBooks. DOI: 10.1007/978-3-030-78132-3
Khan, H., Gul, R., & Zeb, M. (2023). “The Effect of Students’ Cognitive and Emotional Engagement on Students’ Academic Success and Academic Productivity”. Journal Of Social Sciences Review, 3(1), 322-334. DOI: 10.54183/jssr.v3i1.141
Lange, C. (2021). “The relationship between e-learning personalization and cognitive load”. Open Learning the Journal of Open Distance And e-Learning, 38(3), 228-242. DOI: 10.1080/02680513.2021.2019577
Levchuk, O. (2024). Diseño y evaluación de un tutor inteligente basado en Inteligencia Artificial Generativa para la adquisición de habilidades de programación. Tesis de Maestría en Ciencias. CICESE, Baja California, México. 92 pp.
Levchuk, O., Sánchez, C., Pacheco, N., López, I., & Favela, J. (2024). “Interaction Design (IxD) of an Intelligent Tutor for Programming Learning Based on LLM”. Avances en Interacción Humano-Computadora, 9(1), 1–10. DOI: 10.47756/aihc.y9i1.137
Liu, Z., He, X., Liu, L., Liu, T., & Zhai, X. (2023). “Context matters: A strategy to pre-train language model for science education”. In International Conference on Artificial Intelligence in Education, 666-674. Cham: Springer Nature Switzerland. DOI: 10.1007/978-3-031-36336-8_103
Qureshi, B. (2023). “Exploring the use of chatgpt as a tool for learning and assessment in undergraduate computer science curriculum: Opportunities and challenges”. ArXiv preprint. DOI: 10.48550/arXiv.2304.11214
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. & Sutskever, I. (2019). “Language Models are Unsupervised Multitask Learners”. OpenAI.
Rahman, M. M., & Watanobe, Y. (2023). “ChatGPT for Education and Research: Opportunities, Threats, and Strategies”. Applied Sciences, 13(9), 5783. DOI: 10.3390/app13095783
Scherer, R., Siddiq, F., & Viveros, B. S. (2020). “A meta-analysis of teaching and learning computer programming: Effective instructional approaches and conditions”. Computers In Human Behavior, 109, 106349. DOI: 10.1016/j.chb.2020.106349
Schmucker, R., Xia, M., Azaria, A., & Mitchell, T. (2023). “Ruffle&riley: Towards the automated induction of conversational tutoring systems”. ArXiv preprint. DOI: 10.48550/arXiv.2310.01420
Singh, D., & Rajendran, R. (2024). “Cognitive engagement as a predictor of learning gain in Python programming”. Smart Learning Environments, 11(1). DOI: 10.1186/s40561-024-00330-9
Sonkar, S., Ni, K., Chaudhary, S., & Baraniuk, R. G. (2024). “Pedagogical alignment of large language models”. arXiv preprint. DOI: 10.48550/arXiv.2402.05000
Tamkin, A., Liu, K., Valle, R., & Clark, J. (2025). “Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations”. Anthropic. assets.anthropic.com/m/2e23255f1e84ca97/original/Economic_Tasks_AI_Paper.pdf
Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). “Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models”. CHI EA '22: CHI Conference on Human Factors in Computing Systems Extended Abstracts, Article 332, 1–7. DOI: 10.1145/3491101.3519665
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). “Attention is all you need”. Advances in Neural Information Processing Systems, 30, 5998–6008. DOI: 10.48550/arXiv.1706.03762
Zhai, X., & Wiebe, E. (2023). “Technology-based innovative assessment”. In Classroom-Based STEM Assessment: Contemporary Issues and Perspectives, 99–125.
Published
2025-05-12
How to Cite
LEVCHUK, Oleksiy.
Development and Evaluation of an Intelligent Tutor for Programming Learning Based on Extensive Language Models. In: IBERO-AMERICAN CONFERENCE ON SOFTWARE ENGINEERING (CIBSE), 28. , 2025, Ciudad Real/Espanha.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 273-280.
DOI: https://doi.org/10.5753/cibse.2025.35313.
