An Evaluation of the Impact of Code Generation Tools on Software Development

Resumo


Context: The rise of AI-assisted tools like GitHub Copilot aims to improve productivity in software development, raising questions about their practical impact on developer performance and code quality. Problem: While AI-assisted tools are promoted for enhancing productivity, empirical evidence on their real-world impact remains limited. This study addresses whether such tools improve programming workflows, focusing on developers with varying experience levels and examining effects on coding quality and efficiency. Solution: An empirical study is conducted to assess Copilot’s effectiveness through experiments with student developers, focusing on task efficiency and code correctness, guided by established indicators identified in the literature. IS Theory: Grounded in the Task-Technology Fit (TTF) theory, this research explores the extent to which Copilot aligns with programming tasks and meets user needs, ensuring its relevance to practical development scenarios. Method: A literature review identified key indicators for evaluating AI-assisted code generation, focusing on task completion time and code correctness. Data collection involved task-based experiments with students, followed by quantitative analysis comparing performance with and without Copilot. Summary of Results: Findings indicate that Copilot can significantly reduce task completion time. However, no statistically significant differences were observed in code correctness, suggesting that while Copilot improves efficiency, its recommendations require careful review to ensure quality. Contributions and Impact on IS Field: This study contributes empirical insights into AI-assisted coding’s advantages and challenges. The results inform academia and industry about best practices for implementing such tools effectively, emphasizing their potential for accelerating development while maintaining the need for human oversight.

Palavras-chave: AI-assisted code generation, Copilot, empirical evaluation, developer performance

Referências

Anônimo Autor. 2024. Associacao API. [link].

Anônimo Autor. 2024. Dados Experimento. [link].

Anônimo Autor. 2024. Locadora API. [link].

Samuel Silvestre Batista, Bruno Branco, Otávio Castro, and Guilherme Avelino. 2024. Code on Demand: A Comparative Analysis of the Efficiency Understandability and Self-Correction Capability of Copilot ChatGPT and Gemini. In Proceedings of the XXIII Brazilian Symposium on Software Quality (SBQS ’24). Association for Computing Machinery, New York, NY, USA, 351–361. DOI: 10.1145/3701625.3701673

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL] [link]

Gary Charness, Uri Gneezy, and Michael A. Kuhn. 2012. Experimental methods: Between-subject and within-subject design. Journal of Economic Behavior & Organization 81, 1 (2012), 1–8. DOI: 10.1016/j.jebo.2011.08.009

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. (2021). arXiv:2107.03374. Disponível em [link].

Vincenzo Corso, Leonardo Mariani, Daniela Micucci, and Oliviero Riganelli. 2024. Assessing AI-Based Code Assistants in Method Generation Tasks. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (Lisbon, Portugal) (ICSE-Companion ’24). Association for Computing Machinery, New York, NY, USA, 380–381. DOI: 10.1145/3639478.3643122

Vincenzo Corso, Leonardo Mariani, Daniela Micucci, and Oliviero Riganelli. 2024. Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (Lisbon, Portugal) (ICPC ’24). Association for Computing Machinery, New York, NY, USA, 13–23. DOI: 10.1145/3643916.3644402

Nicole Davila, Igor Wiese, Igor Steinmacher, Lucas Lucio da Silva, Andre Kawamoto, Gilson Jose Peres Favaro, and Ingrid Nunes. 2024. An Industry Case Study on Adoption of AI-based Programming Assistants. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (Lisbon, Portugal) (ICSE-SEIP ’24). Association for Computing Machinery, New York, NY, USA, 92–102. DOI: 10.1145/3639477.3643648

GitHub. 2021. Introducing GitHub Copilot: your AI pair programmer. [link]. Accessed: 2024-02-25.

GitHub. 2024. GitHub Education. [link]. Acesso em: 9 nov. 2024.

Xiaodong Gu, Meng Chen, Yalan Lin, Yuhan Hu, Hongyu Zhang, Chengcheng Wan, Zhao Wei, Yong Xu, and Juhong Wang. 2024. On the Effectiveness of Large Language Models in Domain-Specific Code Generation. ACM Trans. Softw. Eng. Methodol. 34 (Oct. 2024), 1–22. DOI: 10.1145/3697012 Just Accepted.

James D Herbsleb and Deependra Moitra. 2001. Global software development. IEEE software 18, 2 (2001), 16–20.

Rasha Ahmad Husein, Hala Aburajouh, and Cagatay Catal. 2025. Large language models for code completion: A systematic literature review. Computer Standards & Interfaces 92 (2025), 103917. DOI: 10.1016/j.csi.2024.103917

Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 455, 23 pages. DOI: 10.1145/3544548.3580919

Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust Statistical Methods for Empirical Software Engineering. Empirical Software Engineering 22, 2 (2017), 579–630. DOI: 10.1007/s10664-016-9437-5

Rob Kitchin and Martin Dodge. 2014. Code/space: Software and everyday life. Mit Press, USA.

Yun Young Lee, Sam Harwell, Sarfraz Khurshid, and Darko Marinov. 2013. Temporal code completion and navigation. In 2013 35th International Conference on Software Engineering (ICSE). IEEE Press, USA, 1181–1184. DOI: 10.1109/ICSE.2013.6606673

John R Levine, Tony Mason, and Doug Brown. 1992. Lex & yacc. "O’Reilly Media, Inc.", USA.

Werney Lira, Pedro Santos Neto, and Luiz Osorio. 2024. Uma análise do uso de ferramentas de geração de código por alunos de computação. In Anais do IV Simpósio Brasileiro de Educação em Computação (Evento Online). SBC, Porto Alegre, RS, Brasil, 63–71. DOI: 10.5753/educomp.2024.237427

Yue Liu, Thanh Le-Cong, Ratnadira Widyasari, Chakkrit Tantithamthavorn, Li Li, Xuan-Bach D. Le, and David Lo. 2024. Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues. ACM Trans. Softw. Eng. Methodol. 33, 5, Article 116 (June 2024), 26 pages. DOI: 10.1145/3643674

JunSeong Moon, RaeEun Yang, SoMin Cha, and Seong Baeg Kim. 2023. chatGPT vs Mentor : Programming Language Learning Assistance System for Beginners. In 2023 IEEE 8th International Conference On Software Engineering and Computer Systems (ICSECS). 2023 IEEE 8th International Conference On Software Engineering and Computer Systems (ICSECS), Penang, Malaysia, 106–110. DOI: 10.1109/ICSECS58457.2023.10256295

Arghavan MORADI DAKHEL, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, and Zhen Ming (Jack) Jiang. 2023. GitHub Copilot AI pair programmer: Asset or Liability? Journal of Systems and Software 203 (2023), 111734. DOI: 10.1016/j.jss.2023.111734

Mohamed Nejjar, Luca Zacharias, Fabian Stiehle, and Ingo Weber. 2024. LLMs for science: Usage for code generation and data analysis. Journal of Software: Evolution and Process n/a, n/a (2024), e2723. DOI: 10.1002/smr.2723 arXiv: [link]

Nhan Nguyen and Sarah Nadi. 2022. An Empirical Evaluation of GitHub Copilot’s Code Suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 1–5. DOI: 10.1145/3524842.3528470

Francisco Ortin, Jose Quiroga, Oscar Rodriguez-Prieto, and Miguel Garcia. 2022. An empirical evaluation of Lex/Yacc and ANTLR parser generation tools. PLOS ONE 17, 3 (03 2022), 1–16. DOI: 10.1371/journal.pone.0264326

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE] [link]

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. 31, 1, Article 4 (Nov. 2023), 31 pages. DOI: 10.1145/3617367

Ben Puryear and Gina Sprint. 2022. Github Copilot in the Classroom: Learning to Code with AI Assistance. J. Comput. Sci. Coll. 38, 1 (nov 2022), 37–47.

Varun Shah. 2019. Towards Efficient Software Engineering in the Era of AI and ML: Best Practices and Challenges. International Journal of Computer Science and Technology 3, 3 (2019), 63–78.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. Attention Is All You Need. arXiv:1706.03762 [cs.CL]

Han Xue and Yanmin Niu. 2023. Exercise Generation and Student Cognitive Ability Research Based on ChatGPT and Rasch Model. IEEE Access 11 (2023), 1–1. DOI: 10.1109/ACCESS.2023.3325741

Burak Yetiştiren, Işık Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. arXiv:2304.10778 [cs.SE] [link]
Publicado
19/05/2025
OSORIO, Luiz Fernando Mendes; SANTOS NETO, Pedro de A. dos; AVELINO, Guilherme; LIRA, Werney Ayala Luz. An Evaluation of the Impact of Code Generation Tools on Software Development. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 21. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 625-634. DOI: https://doi.org/10.5753/sbsi.2025.246605.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 > >>