Investigating Software Developers’ Perception of LLM Adoption in Code Smell Refactoring
Abstract
This paper investigates developers’ perceptions of refactoring code smells using Large Language Models (LLMs). Through a study with 48 Java developers, we analyzed (i) their opinions on the use of these tools, (ii) which factors affect the prioritization of the refactoring order, and (iii) their perceptions about the quality of the code generated for refactorings. Our results indicate the benefits of using LLMs in refactoring and in the software development processes, including improvement in quality attributes, productivity, learning, and support during development. However, possible shortand long-term maladaptive tendencies were also highlighted, such as using low-quality code, counterproductive decision-making, and dependence on LLMs for development activities.References
Ahmed, I., Mannan, U. A., Gopinath, R., and Jensen, C. (2015). An empirical study of design degradation: How software projects get worse over time. In 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–10.
AlOmar, E. A., Venkatakrishnan, A., Mkaouer, M. W., Newman, C., and Ouni, A. (2024). How to refactor this code? an exploratory study on developer-chatgpt refactoring conversations. In Proceedings of the 21st International Conference on Mining Software Repositories, MSR ’24, page 202–206, New York, NY, USA. Association for Computing Machinery.
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2024). A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol., 15(3).
Corbin, J. and Strauss, A. (2014). Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. SAGE Publications, USA, 4th. edition.
Cordeiro, J., Noei, S., and Zou, Y. (2024). An empirical study on the code refactoring capability of large language models.
Danphitsanuphan, P. and Suwantada, T. (2012). Code smell detecting tool and code smell-structure bug relationship. In 2012 Spring congress on engineering and technology, pages 1–5. IEEE.
dos Santos, H. M., Durelli, V. H., Souza, M., Figueiredo, E., da Silva, L. T., and Durelli, R. S. (2019). Cleangame: Gamifying the identification of code smells. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering, pages 437–446.
Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., and Zhang, J. M. (2023). Large language models for software engineering: Survey and open problems. In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), pages 31–53.
Fowler, M. (2018). Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional.
GitHub (2024). What is github copilot? Disponível em: [link]. Acesso em: 11 mar. 2025.
Google (2023). Introducing gemini: our largest and most capable ai model. Disponível em: [link]. Acesso em: 11 mar. 2025.
Kerievsky, J. (2005). Refactoring to Patterns. Addison-Wesley, Boston.
Kruchten, P., Nord, R. L., and Ozkaya, I. (2012). Technical debt: From metaphor to theory and practice. IEEE Software, 29(6):18–21.
Lacerda, G., Petrillo, F., Pimenta, M., and Guéhéneuc, Y. G. (2020). Code smells and refactoring: A tertiary systematic review of challenges and observations. Journal of Systems and Software, 167:110610.
Li, Z., Wang, C., Liu, Z., Wang, H., Chen, D., Wang, S., and Gao, C. (2023). Cctest: Testing and repairing code completion systems. In Proceedings of the 45th International Conference on Software Engineering (ICSE). IEEE/ACM.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).
Madeyski, L. and Lewowski, T. (2020). Mlcq: Industry-relevant code smell data set. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, EASE ’20, page 342–347, New York, NY, USA. Association for Computing Machinery.
Martins, J., Bezerra, C., Uchôa, A., and Garcia, A. (2021). How do code smell co-occurrences removal impact internal quality attributes? a developers’ perspective. In Proceedings of the XXXV Brazilian Symposium on Software Engineering, SBES ’21, page 54–63, New York, NY, USA. Association for Computing Machinery.
Menolli, A., Strik, B., and Rodrigues, L. (2024). Teaching refactoring to improve code quality with chatgpt: An experience report in undergraduate lessons. In Proceedings of the XXIII Brazilian Symposium on Software Quality, SBQS ’24, page 563–574, New York, NY, USA. Association for Computing Machinery.
OpenAI (2024). Openai api documentation. Disponível em: [link]. Acesso em: 11 mar. 2025.
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., and Wu, X. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 36(7):3580–3599.
Pomian, D., Bellur, A., Dilhara, M., Kurbatova, Z., Bogomolov, E., Bryksin, T., and Dig, D. (2024). Next-generation refactoring: Combining llm insights and ide capabilities for extract method. In 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 275–287.
Sergeyuk, A., Lvova, O., Titov, S., Serova, A., Bagirov, F., Kirillova, E., and Bryksin, T. (2024). Reassessing java code readability models with a human-centered approach. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, ICPC ’24, page 225–235, New York, NY, USA. Association for Computing Machinery.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Zhang, B., Liang, P., Feng, Q., Fu, Y., and Li, Z. (2024). Copilot-in-the-loop: Fixing code smells in copilot-generated python code using copilot. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE ’24, page 2230–2234, New York, NY, USA. Association for Computing Machinery.
AlOmar, E. A., Venkatakrishnan, A., Mkaouer, M. W., Newman, C., and Ouni, A. (2024). How to refactor this code? an exploratory study on developer-chatgpt refactoring conversations. In Proceedings of the 21st International Conference on Mining Software Repositories, MSR ’24, page 202–206, New York, NY, USA. Association for Computing Machinery.
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2024). A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol., 15(3).
Corbin, J. and Strauss, A. (2014). Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. SAGE Publications, USA, 4th. edition.
Cordeiro, J., Noei, S., and Zou, Y. (2024). An empirical study on the code refactoring capability of large language models.
Danphitsanuphan, P. and Suwantada, T. (2012). Code smell detecting tool and code smell-structure bug relationship. In 2012 Spring congress on engineering and technology, pages 1–5. IEEE.
dos Santos, H. M., Durelli, V. H., Souza, M., Figueiredo, E., da Silva, L. T., and Durelli, R. S. (2019). Cleangame: Gamifying the identification of code smells. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering, pages 437–446.
Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., and Zhang, J. M. (2023). Large language models for software engineering: Survey and open problems. In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), pages 31–53.
Fowler, M. (2018). Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional.
GitHub (2024). What is github copilot? Disponível em: [link]. Acesso em: 11 mar. 2025.
Google (2023). Introducing gemini: our largest and most capable ai model. Disponível em: [link]. Acesso em: 11 mar. 2025.
Kerievsky, J. (2005). Refactoring to Patterns. Addison-Wesley, Boston.
Kruchten, P., Nord, R. L., and Ozkaya, I. (2012). Technical debt: From metaphor to theory and practice. IEEE Software, 29(6):18–21.
Lacerda, G., Petrillo, F., Pimenta, M., and Guéhéneuc, Y. G. (2020). Code smells and refactoring: A tertiary systematic review of challenges and observations. Journal of Systems and Software, 167:110610.
Li, Z., Wang, C., Liu, Z., Wang, H., Chen, D., Wang, S., and Gao, C. (2023). Cctest: Testing and repairing code completion systems. In Proceedings of the 45th International Conference on Software Engineering (ICSE). IEEE/ACM.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).
Madeyski, L. and Lewowski, T. (2020). Mlcq: Industry-relevant code smell data set. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, EASE ’20, page 342–347, New York, NY, USA. Association for Computing Machinery.
Martins, J., Bezerra, C., Uchôa, A., and Garcia, A. (2021). How do code smell co-occurrences removal impact internal quality attributes? a developers’ perspective. In Proceedings of the XXXV Brazilian Symposium on Software Engineering, SBES ’21, page 54–63, New York, NY, USA. Association for Computing Machinery.
Menolli, A., Strik, B., and Rodrigues, L. (2024). Teaching refactoring to improve code quality with chatgpt: An experience report in undergraduate lessons. In Proceedings of the XXIII Brazilian Symposium on Software Quality, SBQS ’24, page 563–574, New York, NY, USA. Association for Computing Machinery.
OpenAI (2024). Openai api documentation. Disponível em: [link]. Acesso em: 11 mar. 2025.
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., and Wu, X. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 36(7):3580–3599.
Pomian, D., Bellur, A., Dilhara, M., Kurbatova, Z., Bogomolov, E., Bryksin, T., and Dig, D. (2024). Next-generation refactoring: Combining llm insights and ide capabilities for extract method. In 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 275–287.
Sergeyuk, A., Lvova, O., Titov, S., Serova, A., Bagirov, F., Kirillova, E., and Bryksin, T. (2024). Reassessing java code readability models with a human-centered approach. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, ICPC ’24, page 225–235, New York, NY, USA. Association for Computing Machinery.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Zhang, B., Liang, P., Feng, Q., Fu, Y., and Li, Z. (2024). Copilot-in-the-loop: Fixing code smells in copilot-generated python code using copilot. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE ’24, page 2230–2234, New York, NY, USA. Association for Computing Machinery.
Published
2025-07-20
How to Cite
FREITAS, Javel; PEREIRA, Guilherme; LIMA, Lara; SOUSA, Caio; FILHO, Edivar; SOUZA FILHO, José Cezar de; BEZERRA, Carla.
Investigating Software Developers’ Perception of LLM Adoption in Code Smell Refactoring. In: PROCEEDINGS OF WORKSHOP ON SOCIAL, HUMAN AND ECONOMIC ASPECTS OF SOFTWARE (WASHES), 10. , 2025, Maceió/AL.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 108-119.
ISSN 2763-874X.
DOI: https://doi.org/10.5753/washes.2025.8577.
