Assessing the Potential of Large Language Models (LLMs) for the Automatic Generation of Solutions for Programming Exercises

  • André N. Alcantara UFV
  • Mateus P. Silva UFV
  • Hugo N. Oliveira UFV
  • Julio C. S. Reis UFV

Resumo


Since several AI chatbots have become available to the general public, the popularity of Large Language Models (LLMs) has risen, especially among students, who have used them as educational support tools. This directly impacts the reproduction of scientific knowledge and the learning process. In this context, it is relevant to evaluate the performance and limitations of these tools in providing correct answers. This article discusses the results of C++ code solutions requested from the Meta Llama 3 model for practical programming tasks of Olimpíada Brasileira de Informática (OBI). It was demonstrated that the method still needs some correctness checker, presenting only 30.4% of correct answers in general. In addition, it was evidenced that the model’s performance deteriorates in questions with images, since the model does not support them as input, presenting a 20.3% drop in correct answers. Finally, the model appears to work better the shorter the question text, with 31.1% more correct answers for questions with up to 200 words compared to questions with more than 400 words.
Palavras-chave: LLMs, Programming Exercises, Automatic Generation of Code

Referências

Celso Candido de Azambuja and Gabriel Ferreira da Silva. 2024. Novos desafios para a educação na Era da Inteligência Artificial. Filosofia Unisinos 25, 1 (2024), e25107.

Aditi Bhutoria. 2022. Personalized education and Artificial Intelligence in the United States, China, and India: A systematic review using a Human-In-The-Loop model. Computers and Education: Artificial Intelligence 3 (2022).

Mark Chen, Jerry Tworek, Heewoo Jun, Uri Shoham, Jeremiah Chung, Charles Lasecki, Sheng He, Lawrence Stock, and Barret Zoph. 2021. Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021). arXiv:2107.03374 [cs.LG]

Polat Ersoy and Mahmut Erşahin. 2024. Benchmarking Llama 3 70B for Code Generation: A Comprehensive Evaluation. Orclever Proceedings of Research and Development 4, 1 (2024), 52–58.

Tung Phung, Victor-Alexandru Pădurean, Anjali Singh, Christopher Brooks, José Cambronero, Sumit Gulwani, Adish Singla, and Gustavo Soares. 2024. Automating human tutor-style programming feedback: Leveraging gpt-4 tutor model for hint generation and gpt-3.5 student model for hint validation. In Proc.of the Learning Analytics and Knowledge Conference. 12–23.

Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proc. of the ACM Conference on International Computing Education Research-Volume 1. 27–43.

Jaromir Savelka, Arav Agarwal, Christopher Bogart, Yifan Song, and Majd Sakr. 2023. Can generative pre-trained transformers (gpt) pass assessments in higher education programming courses?. In Proc. of the Conference on Innovation and Technology in Computer Science Education V. 1. 117–123.

Priscylla Silva and Evandro Costa. 2025. Assessing large language models for automated feedback generation in learning programming problem solving. arXiv preprint arXiv:2503.14630 (2025).

João Vitor de Melo Cavalcante Souza. 2024. Avaliando modelos de linguagem grande na resolução de problemas de lógica da OBI. (2024).

Yoshija Walter. 2024. Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education. International Journal of Educational Technology in Higher Education 21, 1 (2024), 15.

Shan Wang, Fang Wang, Zhen Zhu, Jingxuan Wang, Tam Tran, and Zhao Du. 2024. Artificial intelligence in education: A systematic literature review. Expert Systems with Applications 252 (2024), 124167.

Lixiang Yan, Lele Sha, Linxuan Zhao, Yuheng Li, Roberto Martinez-Maldonado, Guanliang Chen, Xinyu Li, Yueqiao Jin, and Dragan Gašević. 2024. Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology 55, 1 (2024), 90–112.
Publicado
10/11/2025
ALCANTARA, André N.; SILVA, Mateus P.; OLIVEIRA, Hugo N.; REIS, Julio C. S.. Assessing the Potential of Large Language Models (LLMs) for the Automatic Generation of Solutions for Programming Exercises. In: CONCURSO DE TRABALHOS DE INICIAÇÃO CIENTÍFICA - SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 41-44. ISSN 2596-1683. DOI: https://doi.org/10.5753/webmedia_estendido.2025.16422.