Code on Demand: A Comparative Analysis of the Efficiency Understandability and Self-Correction Capability of Copilot ChatGPT and Gemini

Samuel Silvestre Batista; Bruno Branco; Otávio Castro; Guilherme Avelino

Samuel Silvestre Batista UFPI
Bruno Branco UFPI
Otávio Castro UFPI
Guilherme Avelino UFPI

Resumo

The increasing demand for software development and a shortage of skilled developers has catalyzed the emergence of AI-powered code generation tools as potential solutions to enhance productivity and accessibility. While these tools promise to revolutionize software development, limited research critically evaluates their performance, particularly concerning code quality, comprehensibility, and self-correction capabilities. Therefore, this paper investigates the efficacy, comprehensibility, and self-repair capabilities of three prominent AI code generation tools: GitHub Copilot, ChatGPT, and Gemini. We replicate and extend the study by Nguyen and Nadi, evaluating the tools’ performance on 33 LeetCode problems across four programming languages: Python, Java, JavaScript, and C. Our findings demonstrate notable variations in performance between the tools and programming languages. Copilot achieved the highest accuracy in Java (87.88%), while ChatGPT consistently performed well across Python, Java, and JavaScript (78.79%). Gemini excelled in JavaScript (75.76%). Copilot generally generated more understandable code, as indicated by lower cognitive and cyclomatic complexity, whereas ChatGPT and Gemini exhibited higher variability. All tools demonstrated promising self-repair capabilities, but their effectiveness varied depending on the type of error and programming language. This study provides valuable insights into the strengths and limitations of these AI-powered tools, informing their practical application in software development.