Assessing the Use of a Code Generation Assistant in Professional Software Development: An Experience Report

Abstract


AI-assisted code generation tools, such as GitHub Copilot, have gained attention from the software industry due to their potential to support development tasks. This paper presents an experience report from a case study on the adoption of GitHub Copilot by a team of developers at the Superintendence of Information Technology of the Federal University of Piauí (STI/UFPI), aiming to evaluate the effects of using Copilot in a real-world environment involving legacy system maintenance. The study was carried out over three months, comparing periods with and without the use of the tool. Key metrics such as code change rate and developers’ perceptions were analyzed. The results indicate a increase in the code change rate for most developers, particularly in bug-fixing tasks, while customization tasks showed a more nuanced outcome. Participants also reported positive feedback regarding Copilot’s usefulness. This paper discusses the challenges faced during the study’s execution, methodological limitations, and proposes directions for future research on the impact of AI assistants in diverse software development contexts.

Keywords: GitHub Copilot, Code Generation Tools, Empirical Study, Legacy Systems, Artificial Intelligence, Experience Report

References

Gal Bakal, Ali Dasdan, Yaniv Katz, Michael Kaufman, and Guy Levin. 2025. Experience with GitHub Copilot for Developer Productivity at Zoominfo. arXiv:2501.13282 [cs.SE] [link]

Laurence Bardin. 1977. Análise de Conteúdo. Edições 70, Lisboa.

Samuel Silvestre Batista, Bruno Branco, Otávio Castro, and Guilherme Avelino. 2024. Code on Demand: A Comparative Analysis of the Efficiency Understandability and Self-Correction Capability of Copilot ChatGPT and Gemini. In Proceedings of the XXIII Brazilian Symposium on Software Quality (SBQS ’24). Association for Computing Machinery, New York, NY, USA, 351–361. DOI: 10.1145/3701625.3701673

J Martin Bland and Douglas G Altman. 1995. Multiple significance tests: the Bonferroni method. Bmj 310, 6973 (1995), 170.

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL] [link]

Sergio Cavalcante, Erick Ribeiro, and Ana Oran. 2025. The Impact of AI Tools on Software Development: A Case Study with GitHub Copilot and Other AI Assistants. In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 2: ICEIS. INSTICC, SciTePress, 245–252. DOI: 10.5220/0013294700003929

Sayan Chatterjee, Ching Louis Liu, Gareth Rowland, and Tim Hogarth. 2024. The Impact of AI Tool on Engineering at ANZ Bank: An Empirical Study on GitHub Copilot within a Corporate Environment. arXiv preprint arXiv:2402.05636 (2024). [link]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. (2021). arXiv:2107.03374 [link] Disponível em [link].

Vincenzo Corso, Leonardo Mariani, Daniela Micucci, and Oliviero Riganelli. 2024. Assessing AI-Based Code Assistants in Method Generation Tasks. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (Lisbon, Portugal) (ICSE-Companion ’24). Association for Computing Machinery, New York, NY, USA, 380–381. DOI: 10.1145/3639478.3643122

Nicole Davila, Igor Wiese, Igor Steinmacher, Lucas Lucio da Silva, Andre Kawamoto, Gilson Jose Peres Favaro, and Ingrid Nunes. 2024. An Industry Case Study on Adoption of AI-based Programming Assistants. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (Lisbon, Portugal) (ICSE-SEIP ’24). Association for Computing Machinery, New York, NY, USA, 92–102. DOI: 10.1145/3639477.3643648

GitHub. 2021. Introducing GitHub Copilot: your AI pair programmer. GitHub Blog (2021). Available at: [link].

GitHub. 2023. GitHub Copilot – November 30th Update. [link]. Accessed: December 4, 2024.

Rasha Ahmad Husein, Hala Aburajouh, and Cagatay Catal. 2025. Large language models for code completion: A systematic literature review. Computer Standards & Interfaces 92 (2025), 103917. DOI: 10.1016/j.csi.2024.103917

Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 455, 23 pages. DOI: 10.1145/3544548.3580919

John R Levine, Tony Mason, and Doug Brown. 1992. Lex & yacc. " O’Reilly Media, Inc.".

Werney Lira, Pedro Santos Neto, and Luiz Osorio. 2024. Uma análise do uso de ferramentas de geração de código por alunos de computação. In Anais do IV Simpósio Brasileiro de Educação em Computação (Evento Online). SBC, Porto Alegre, RS, Brasil, 63–71. DOI: 10.5753/educomp.2024.237427

Microsoft. 2024. IntelliCode. [link]. Acessado: 13 de jun. de 2025.

Microsoft. 2024. Visual Studio Code - Code Editing. Redefined. [link]. Acessado: 13 de jun. de 2025.

Mauricio Monteiro, Bruno Castelo Branco, Samuel Silvestre, Guilherme Avelino, and Marco Tulio Valente. 2025. NoCodeGPT: A No-Code Interface for Building Web Apps With Language Models. Software: Practice and Experience 55, 8 (2025), 1408–1424. arXiv: [link] DOI: 10.1002/spe.3432

JunSeong Moon, RaeEun Yang, SoMin Cha, and Seong Baeg Kim. 2023. chat-GPT vs Mentor : Programming Language Learning Assistance System for Beginners. In 2023 IEEE 8th International Conference On Software Engineering and Computer Systems (ICSECS). 2023 IEEE 8th International Conference On Software Engineering and Computer Systems (ICSECS), Penang, Malaysia, 106–110. DOI: 10.1109/ICSECS58457.2023.10256295

Arghavan MORADI DAKHEL, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, and Zhen Ming (Jack) Jiang. 2023. GitHub Copilot AI pair programmer: Asset or Liability? Journal of Systems and Software 203 (2023), 111734. DOI: 10.1016/j.jss.2023.111734

Nhan Nguyen and Sarah Nadi. 2022. An Empirical Evaluation of GitHub Copilot’s Code Suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 1–5. DOI: 10.1145/3524842.3528470

Fernando Osorio. 2025. Dataset: Assessing the Use of a Code Generation Assistant in Professional Software Development: An Experience Report. [link]. Disponível em repositório GitHub.

Luiz Osorio, Pedro Santos Neto, Guilherme Avelino, and Werney Lira. 2025. An Evaluation of the Impact of Code Generation Tools on Software Development. In Anais do XXI Simpósio Brasileiro de Sistemas de Informação (Recife/PE). SBC, Porto Alegre, RS, Brasil, 625–634. DOI: 10.5753/sbsi.2025.246605

Ruchika Pandey, Prabhat Singh, Raymond Wei, and Shaila Shankar. 2024. Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects. arXiv preprint arXiv:2406.17910 (2024). Available at [link].

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE] [link]

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. 31, 1, Article 4 (Nov. 2023), 31 pages. DOI: 10.1145/3617367

Yashaswini Raghavan. 2023. The Impact of GitHub Copilot on Developer Productivity: A Case Study. Harness Software Engineering Insights (2023). [link]

Forrest Shull, Janice Singer, and Dag IK Sjøberg. 2008. Guide to Advanced Empirical Software Engineering. Vol. 93. Springer, Berlin, Heidelberg.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. Attention Is All You Need. arXiv:1706.03762 [cs.CL]

Roel Wieringa and Ayse Morali. 2012. Technical action research as a validation method in information systems design science. In International Conference on Design Science Research in Information Systems. Springer, Berlin, Heidelberg, 220–238.

ClaesWohlin and Aybüke Aurum. 2015. Towards a decision-making structure for selecting a research design in empirical software engineering. Empirical Softw. Engg. 20, 6 (Dec. 2015), 1427–1455. DOI: 10.1007/s10664-014-9319-7

Han Xue and Yanmin Niu. 2023. Exercise Generation and Student Cognitive Ability Research Based on ChatGPT and Rasch Model. IEEE Access 11 (2023), 1–1. DOI: 10.1109/ACCESS.2023.3325741

Burak Yetiştiren, Işık Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. arXiv:2304.10778 [cs.SE] [link]

Li Zhang, Jia-Hao Tian, Jing Jiang, Yi-Jun Liu, Meng-Yuan Pu, and Tao Yue. 2018. Empirical Research in Software Engineering — A Literature Survey. Journal of Computer Science and Technology 33, 5 (September 2018), 876–899. DOI: 10.1007/s11390-018-1864-x
Published
2025-11-04
OSORIO, Luiz Fernando Mendes; SANTOS NETO, Pedro de A. dos; AVELINO, Guilherme; LIRA, Werney Ayala Luz. Assessing the Use of a Code Generation Assistant in Professional Software Development: An Experience Report. In: BRAZILIAN SOFTWARE QUALITY SYMPOSIUM (SBQS), 24. , 2025, São José dos Campos/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 421-430. DOI: https://doi.org/10.5753/sbqs.2025.13845.