How Readable Is LLM-Generated Code Snippets? A Comparison of ChatGPT, DeepSeek, and Gemini
Abstract
Developers often search for reusable code snippets on the Web. With the increasing adoption of Large Language Models (LLMs) to support programming tasks and the growing number of available models, this work proposes an evaluation of the readability of 981 code snippets generated by ChatGPT, DeepSeek, and Gemini (327 by each model) by analyzing the warnings detected by static analysis tools such as SonarLint. Additionally, we present a preliminary approach that combines SonarLint recommendations with LLMs to automatically refactor code snippets with the goal of improving code readability. The results show that ChatGPT produces code with fewer readability warnings according to SonarLint. All three LLMs were also able to remove readability warnings in more than 60% of the affected code snippets. However, challenges remain when combining LLMs with static analysis tools, particularly in understanding the context of certain rules and avoiding the removal of relevant code. The insights from this study reveal opportunities for deeper integration between static analysis tools and LLMs.References
Al Madi, N. (2023). How readable is model-generated code? examining readability and visual inspection of github copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ASE ’22, New York, NY, USA. Association for Computing Machinery.
ChatGpt (2025). Chatgpt web page. [link]. Accessed on May 10, 2025.
Crokage (2025). Crokage tool web page. [link]. Accessed on May 10, 2025.
da Silva, R. F. G., Roy, C. K., Rahman, M. M., Schneider, K. A., Paixão, K. V. R., de Carvalho Dantas, C. E., and de Almeida Maia, M. (2020). CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge. Empir. Softw. Eng., 25(6):4707–4758.
Dantas, C. E., Rocha, A. M., and Maia, M. A. (2023). Assessing the readability of chatgpt code snippet recommendations: A comparative study. In Proceedings of the XXXVII Brazilian Symposium on Software Engineering, SBES ’23, page 283–292, New York, NY, USA. Association for Computing Machinery.
DeepSeek (2025). Deepseek web page. [link]. Accessed on February 6, 2025.
Fernandes, G., Maia, M. A., and Dantas, C. E. C. (2025). How Readable Is LLM-Generated Code Snippet? A Comparison of ChatGPT, DeepSeek, and Gemini - Replication Package. DOI: 10.5281/zenodo.15803308.
G1 Tecnologia (2025). App chinês deepseek supera chatgpt nos eua e derruba ações de empresas ligadas à ia. [link]. Accessed on February 6, 2025.
Google (2025). Gemini web page. [link]. Accessed on February 6, 2025.
Holmes, R., Cottrell, R., Walker, R. J., and Denzinger, J. (2009). The end-to-end use of source code examples: An exploratory study. In 2009 IEEE International Conference on Software Maintenance, pages 555–558.
Hora, A. (2021a). Googling for software development: What developers search for and what they find. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pages 317–328.
Hora, A. C. (2021b). Apisonar: Mining api usage examples. Software: Practice and Experience, 51:319 – 352.
Jaoua, I., Sghaier, O. B., and Sahraoui, H. (2025). Combining Large Language Models with Static Analyzers for Code Review Generation . In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR), pages 174–186, Los Alamitos, CA, USA. IEEE Computer Society.
Keivanloo, I., Rilling, J., and Zou, Y. (2014). Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, page 664–675, New York, NY, USA. Association for Computing Machinery.
Loriot, B., Madeiral, F., and Monperrus, M. (2022). Styler: learning formatting conventions to repair checkstyle violations. Empirical Software Engineering, 27.
Marcilio, D., Bonifácio, R., Monteiro, E., Canedo, E., Luz, W., and Pinto, G. (2019). Are static analysis violations really fixed? a closer look at realistic usage of sonarqube. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pages 209–219.
Minelli, R., and, A. M., and Lanza, M. (2015). I know what you did last summer: An investigation of how developers spend their time. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension, ICPC ’15, page 25–35. IEEE Press.
Niu, H., Keivanloo, I., and Zou, Y. (2017). Learning to rank code examples for code search engines. Empirical Softw. Engg., 22(1):259–291.
Piantadosi, V., Fierro, F., Scalabrino, S., Serebrenik, A., and Oliveto, R. (2020). How does code readability change during software evolution? Empirical Software Engineering, 25:1–39.
Romano, S., Zampetti, F., Baldassarre, M. T., Di Penta, M., and Scanniello, G. (2022). Do static analysis tools affect software quality when using test-driven development? In Empirical Software Engineering and Measurement, ESEM ’22.
Sadowski, C., Stolee, K. T., and Elbaum, S. (2015). How developers search for code: a case study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, page 191–201. Association for Computing Machinery, New York, NY, USA.
Silva, J., Dantas, C., and Maia, M. (2024). What developers ask to chatgpt in github pull requests? an exploratory study. In Anais do XII Workshop de Visualização, Evolução e Manutenção de Software, pages 125–136, Porto Alegre, RS, Brasil. SBC.
Sobania, D., Briesch, M., Hanna, C., and Petke, J. (2023). An Analysis of the Automatic Bug Fixing Performance of ChatGPT . In 2023 IEEE/ACM International Workshop on Automated Program Repair (APR), pages 23–30, Los Alamitos, CA, USA. IEEE Computer Society.
SonarLint (2025). Sonarlint web page. [link]. Accessed on February 6, 2025.
Tufano, R., Mastropaolo, A., Pepe, F., Dabić, O., Penta, M. D., and Bavota, G. (2024). Unveiling ChatGPT’s usage in open source projects: A mining-based study.
Zhang, Z., Xing, Z., Zhao, D., Xu, X., Zhu, L., and Lu, Q. (2024). Automated refactoring of non-idiomatic python code with pythonic idioms. IEEE Transactions on Software Engineering, PP:1–22.
ChatGpt (2025). Chatgpt web page. [link]. Accessed on May 10, 2025.
Crokage (2025). Crokage tool web page. [link]. Accessed on May 10, 2025.
da Silva, R. F. G., Roy, C. K., Rahman, M. M., Schneider, K. A., Paixão, K. V. R., de Carvalho Dantas, C. E., and de Almeida Maia, M. (2020). CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge. Empir. Softw. Eng., 25(6):4707–4758.
Dantas, C. E., Rocha, A. M., and Maia, M. A. (2023). Assessing the readability of chatgpt code snippet recommendations: A comparative study. In Proceedings of the XXXVII Brazilian Symposium on Software Engineering, SBES ’23, page 283–292, New York, NY, USA. Association for Computing Machinery.
DeepSeek (2025). Deepseek web page. [link]. Accessed on February 6, 2025.
Fernandes, G., Maia, M. A., and Dantas, C. E. C. (2025). How Readable Is LLM-Generated Code Snippet? A Comparison of ChatGPT, DeepSeek, and Gemini - Replication Package. DOI: 10.5281/zenodo.15803308.
G1 Tecnologia (2025). App chinês deepseek supera chatgpt nos eua e derruba ações de empresas ligadas à ia. [link]. Accessed on February 6, 2025.
Google (2025). Gemini web page. [link]. Accessed on February 6, 2025.
Holmes, R., Cottrell, R., Walker, R. J., and Denzinger, J. (2009). The end-to-end use of source code examples: An exploratory study. In 2009 IEEE International Conference on Software Maintenance, pages 555–558.
Hora, A. (2021a). Googling for software development: What developers search for and what they find. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pages 317–328.
Hora, A. C. (2021b). Apisonar: Mining api usage examples. Software: Practice and Experience, 51:319 – 352.
Jaoua, I., Sghaier, O. B., and Sahraoui, H. (2025). Combining Large Language Models with Static Analyzers for Code Review Generation . In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR), pages 174–186, Los Alamitos, CA, USA. IEEE Computer Society.
Keivanloo, I., Rilling, J., and Zou, Y. (2014). Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, page 664–675, New York, NY, USA. Association for Computing Machinery.
Loriot, B., Madeiral, F., and Monperrus, M. (2022). Styler: learning formatting conventions to repair checkstyle violations. Empirical Software Engineering, 27.
Marcilio, D., Bonifácio, R., Monteiro, E., Canedo, E., Luz, W., and Pinto, G. (2019). Are static analysis violations really fixed? a closer look at realistic usage of sonarqube. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pages 209–219.
Minelli, R., and, A. M., and Lanza, M. (2015). I know what you did last summer: An investigation of how developers spend their time. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension, ICPC ’15, page 25–35. IEEE Press.
Niu, H., Keivanloo, I., and Zou, Y. (2017). Learning to rank code examples for code search engines. Empirical Softw. Engg., 22(1):259–291.
Piantadosi, V., Fierro, F., Scalabrino, S., Serebrenik, A., and Oliveto, R. (2020). How does code readability change during software evolution? Empirical Software Engineering, 25:1–39.
Romano, S., Zampetti, F., Baldassarre, M. T., Di Penta, M., and Scanniello, G. (2022). Do static analysis tools affect software quality when using test-driven development? In Empirical Software Engineering and Measurement, ESEM ’22.
Sadowski, C., Stolee, K. T., and Elbaum, S. (2015). How developers search for code: a case study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, page 191–201. Association for Computing Machinery, New York, NY, USA.
Silva, J., Dantas, C., and Maia, M. (2024). What developers ask to chatgpt in github pull requests? an exploratory study. In Anais do XII Workshop de Visualização, Evolução e Manutenção de Software, pages 125–136, Porto Alegre, RS, Brasil. SBC.
Sobania, D., Briesch, M., Hanna, C., and Petke, J. (2023). An Analysis of the Automatic Bug Fixing Performance of ChatGPT . In 2023 IEEE/ACM International Workshop on Automated Program Repair (APR), pages 23–30, Los Alamitos, CA, USA. IEEE Computer Society.
SonarLint (2025). Sonarlint web page. [link]. Accessed on February 6, 2025.
Tufano, R., Mastropaolo, A., Pepe, F., Dabić, O., Penta, M. D., and Bavota, G. (2024). Unveiling ChatGPT’s usage in open source projects: A mining-based study.
Zhang, Z., Xing, Z., Zhao, D., Xu, X., Zhu, L., and Lu, Q. (2024). Automated refactoring of non-idiomatic python code with pythonic idioms. IEEE Transactions on Software Engineering, PP:1–22.
Published
2025-09-22
How to Cite
FERNANDES, Giovanna; MAIA, Marcelo A.; DANTAS, Carlos Eduardo C..
How Readable Is LLM-Generated Code Snippets? A Comparison of ChatGPT, DeepSeek, and Gemini. In: WORKSHOP ON SOFTWARE VISUALIZATION, EVOLUTION AND MAINTENANCE (VEM), 13. , 2025, Recife/PE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 13-24.
DOI: https://doi.org/10.5753/vem.2025.14275.
