Analysis of the effects of language on automatic response generation by LLM applications
Abstract
Currently, large language models (LLM) can successfully solve problems typically used in introductory Computing courses (CS1). However, studies generally consider problems and prompts written in English. The objective of this work is to evaluate the effects of the Portuguese and English languages in the specification of the problem for the automatic generation of answers in problems applicable to the context of introductory programming classes (CS1). We could observe that the success rates for ChatGPT and Bard are high, both for problems in Portuguese and English, while HuggingChat does not achieve good results. As for the structure of the prompt, none of the LLM applications managed to get the answer right with just the problem description. However, by providing more information such as output formatting rules, hints, and test cases, ChatGPT and Bard generally perform better.
References
CodeBench. 2023. CodeBench Educational Mining Dataset 1.80. Conjunto de dados. [link]
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education (54 ed.) (Toronto, ON, Canadá). ACM, New York, NY, EUA, 1136–1142
Paul Denny, James Prather, Brett A. Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N. Reeves, Eddie Antonio Santos, and Sami Sarsa. 2024. Computing Education in the Era of Generative AI. Commun. ACM 67, 2, 56–67.
James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In 24th Australasian Computing Education Conference (ACE 2022) (24 ed.) (Austrália), Judy Sheard and Paul Denny (Eds.). ACM, New York, NY, EUA, 10–19.
Sam Lau and Philip Guo. 2023. From "Ban It Till We Understand It"to "Resistance is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools Such as ChatGPT and GitHub Copilot. In ACM Conference on International Computing Education Research (ICER) 2023 (19 ed.) (Chicago, IL, EUA), Kathi Fisler and Paul Denny (Eds.). ACM, New York, NY, EUA, 106–121.
Stephen R. Piccolo, Paul Denny, Andrew Luxton-Reilly, Samuel H. Payne, and Perry G. Ridge. 2023. Evaluating a large language model’s ability to solve programming exercises from an introductory bioinformatics course. PLOS Computational Biology 19, 9, 1–16.
James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Petersen, Raymond Pettit, Brent N. Reeves, and Jaromir Savelka. 2023. The Robots Are Here: Navigating the Generative AI Revolution in Computing Education. In 2023 Conference on Innovation and Technology in Computer Science Education (Turku, Finlândia). ACM, New York, NY, EUA, 108–159.
Mike Richards, Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, and Daniel Gooch. 2024. Bob or Bot: Exploring ChatGPT’s Answers to University Computer Science Assessment. Transactions on Computing Education 24, 1, 5:1–5:32.
Jaromir Savelka, Arav Agarwal, Marshall An, Chris Bogart, and Majd Sakr. 2023. Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses. In ACM Conference on International Computing Education Research (ICER) 2023 (19 ed.) (Chicago, IL, EUA), Kathi Fisler and Paul Denny (Eds.). ACM, New York, NY, EUA, 78–92.
