Comparative Evaluation of GitHub Copilot and Amazon CodeWhisperer in Automatic Source Code Generation

  • Pedro C. Miranda PUC Minas
  • Michelle Hanne S. de Andrade PUC Minas

Abstract


Technology companies such as Amazon and OpenAI have been investing in the creation of Large Language Models (LLMs) that assist developers in generating source code. This work aims to evaluate the effectiveness of the automatic source code generation tools GitHub Copilot and Amazon CodeWhisperer. These tools use language models trained with public GitHub source code to generate suggestions based on natural language descriptions. This study collected 33 programming problems from the LeetCode website in three programming languages: Python, JavaScript, and Java. Subsequently, for each language, source code was generated using GitHub Copilot and Amazon CodeWhisperer. Then, the generated source codes were submitted and analyzed by the LeetCode platform. The results indicate that GitHub Copilot presented a higher accuracy rate (77.78%) compared to Amazon CodeWhisperer (64.8%), although no significant differences were found in terms of the efficiency and readability of the generated code.

References

Abukhalaf, S., Hamdaqa, M., and Khomh, F. (2023). On codex prompt engineering for ocl generation: An empirical study. arXiv preprint arXiv:2303.16244.

Barbetta, P. A., Reis, M. M., and Bornia, A. C. (2010). Estatística: para cursos de engenharia e informática.

Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al. (2021). Evaluating large language models trained on code.(2021). arXiv preprint arXiv:2107.03374.

Denny, P., Kumar, V., and Giacaman, N. (2023). Conversing with copilot: Exploring prompt engineering for solving cs1 problems using natural language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, pages 1136–1142.

Imai, S. (2022). Is github copilot a substitute for human pair-programming? an empirical study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, ICSE ’22, page 319–321, New York, NY, USA. Association for Computing Machinery.

Mastropaolo, A., Pascarella, L., Guglielmi, E., Ciniselli, M., Scalabrino, S., Oliveto, R., and Bavota, G. (2023). On the robustness of code generation techniques: An empirical study on github copilot. In Proceedings of the 45th International Conference on Software Engineering, ICSE ’23, page 2149–2160. IEEE Press.

Miranda, P. (2024). [DataSet] Avaliação Comparativa do GitHub Copilot e do Amazon CodeWhisperer na Geração Automatica de Código-Fonte.

Nguyen, N. and Nadi, S. (2022). An empirical evaluation of github copilot’s code suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories, MSR ’22, page 1–5, New York, NY, USA. Association for Computing Machinery.

Siddiq, M. L., Samee, A., Azgor, S. R., Haider, M. A., Sawraz, S. I., and Santos, J. C. S. (2023). Zero-shot prompting for code complexity prediction using github copilot. In 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE), pages 56–59.

Vaithilingam, P., Zhang, T., and Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, CHI EA ’22, New York, NY, USA. Association for Computing Machinery.

Wermelinger, M. (2023). Using github copilot to solve simple programming problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, SIGCSE 2023, page 172–178, New York, NY, USA. Association for Computing Machinery.

Yetistiren, B., Ozsoy, I., and Tuzun, E. (2022). Assessing the quality of github copilot’s code generation. In Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2022, page 62–71, New York, NY, USA. Association for Computing Machinery.
Published
2024-09-30
MIRANDA, Pedro C.; ANDRADE, Michelle Hanne S. de. Comparative Evaluation of GitHub Copilot and Amazon CodeWhisperer in Automatic Source Code Generation. In: SOFTWARE ENGINEERING UNDERGRADUATE RESEARCH COMPETITION - BRAZILIAN CONFERENCE ON SOFTWARE: THEORY AND PRACTICE (CBSOFT), 15. , 2024, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 59-68. DOI: https://doi.org/10.5753/cbsoft_estendido.2024.4100.