Investigando a Percepção de Desenvolvedores de Software sobre a Adoção de LLMs na Refatoração de Code Smells

Javel Freitas; Guilherme Pereira; Lara Lima; Caio Sousa; Edivar Filho; José Cezar de Souza Filho; Carla Bezerra

doi:10.5753/washes.2025.8577

Javel Freitas UFC
Guilherme Pereira UFC
Lara Lima UFC
Caio Sousa UFC
Edivar Filho UFC
José Cezar de Souza Filho Univ. Polytechnique Hauts-de-France
Carla Bezerra UFC

DOI: https://doi.org/10.5753/washes.2025.8577

Resumo

Este artigo investiga a percepção de desenvolvedores sobre a refatoração de code smells por meio de Large Language Models (LLMs). Através de um estudo com 48 desenvolvedores Java, analisamos (i) a opinião sobre o uso dessas ferramentas, (ii) quais fatores afetam a priorização da ordem de refatoração e (iii) a percepção sobre a qualidade do código gerado para as refatorações. Nossos resultados indicam os benefícios do uso de LLMs tanto na refatoração quanto no processo de desenvolvimento de software, incluindo melhoria em atributos de qualidade, produtividade, aprendizado e apoio durante o desenvolvimento. Porém, também foram evidenciadas possíveis tendências desadaptativas a curto e longo prazo, como o uso de códigos de baixa qualidade, a tomada de decisões contraproducentes e a dependência nos LLMs para as atividades de desenvolvimento.

Referências

Ahmed, I., Mannan, U. A., Gopinath, R., and Jensen, C. (2015). An empirical study of design degradation: How software projects get worse over time. In 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–10.

AlOmar, E. A., Venkatakrishnan, A., Mkaouer, M. W., Newman, C., and Ouni, A. (2024). How to refactor this code? an exploratory study on developer-chatgpt refactoring conversations. In Proceedings of the 21st International Conference on Mining Software Repositories, MSR ’24, page 202–206, New York, NY, USA. Association for Computing Machinery.

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2024). A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol., 15(3).

Corbin, J. and Strauss, A. (2014). Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. SAGE Publications, USA, 4th. edition.

Cordeiro, J., Noei, S., and Zou, Y. (2024). An empirical study on the code refactoring capability of large language models.

Danphitsanuphan, P. and Suwantada, T. (2012). Code smell detecting tool and code smell-structure bug relationship. In 2012 Spring congress on engineering and technology, pages 1–5. IEEE.

dos Santos, H. M., Durelli, V. H., Souza, M., Figueiredo, E., da Silva, L. T., and Durelli, R. S. (2019). Cleangame: Gamifying the identification of code smells. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering, pages 437–446.

Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., and Zhang, J. M. (2023). Large language models for software engineering: Survey and open problems. In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), pages 31–53.

Fowler, M. (2018). Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional.

GitHub (2024). What is github copilot? Disponível em: [link]. Acesso em: 11 mar. 2025.

Google (2023). Introducing gemini: our largest and most capable ai model. Disponível em: [link]. Acesso em: 11 mar. 2025.

Kerievsky, J. (2005). Refactoring to Patterns. Addison-Wesley, Boston.

Kruchten, P., Nord, R. L., and Ozkaya, I. (2012). Technical debt: From metaphor to theory and practice. IEEE Software, 29(6):18–21.

Lacerda, G., Petrillo, F., Pimenta, M., and Guéhéneuc, Y. G. (2020). Code smells and refactoring: A tertiary systematic review of challenges and observations. Journal of Systems and Software, 167:110610.

Li, Z., Wang, C., Liu, Z., Wang, H., Chen, D., Wang, S., and Gao, C. (2023). Cctest: Testing and repairing code completion systems. In Proceedings of the 45th International Conference on Software Engineering (ICSE). IEEE/ACM.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).

Madeyski, L. and Lewowski, T. (2020). Mlcq: Industry-relevant code smell data set. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, EASE ’20, page 342–347, New York, NY, USA. Association for Computing Machinery.

Martins, J., Bezerra, C., Uchôa, A., and Garcia, A. (2021). How do code smell co-occurrences removal impact internal quality attributes? a developers’ perspective. In Proceedings of the XXXV Brazilian Symposium on Software Engineering, SBES ’21, page 54–63, New York, NY, USA. Association for Computing Machinery.

Menolli, A., Strik, B., and Rodrigues, L. (2024). Teaching refactoring to improve code quality with chatgpt: An experience report in undergraduate lessons. In Proceedings of the XXIII Brazilian Symposium on Software Quality, SBQS ’24, page 563–574, New York, NY, USA. Association for Computing Machinery.

OpenAI (2024). Openai api documentation. Disponível em: [link]. Acesso em: 11 mar. 2025.

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., and Wu, X. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 36(7):3580–3599.

Pomian, D., Bellur, A., Dilhara, M., Kurbatova, Z., Bogomolov, E., Bryksin, T., and Dig, D. (2024). Next-generation refactoring: Combining llm insights and ide capabilities for extract method. In 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 275–287.

Sergeyuk, A., Lvova, O., Titov, S., Serova, A., Bagirov, F., Kirillova, E., and Bryksin, T. (2024). Reassessing java code readability models with a human-centered approach. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, ICPC ’24, page 225–235, New York, NY, USA. Association for Computing Machinery.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.

Zhang, B., Liang, P., Feng, Q., Fu, Y., and Li, Z. (2024). Copilot-in-the-loop: Fixing code smells in copilot-generated python code using copilot. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE ’24, page 2230–2234, New York, NY, USA. Association for Computing Machinery.