Do LLMs Suggest Consistent Identifiers? An Empirical Study on GitHub Pull Requests

Julyanara R. Silva; Marcelo A. Maia; Carlos Eduardo C. Dantas

doi:10.5753/ise.2025.14870

Julyanara R. Silva IFMT
Marcelo A. Maia UFU
Carlos Eduardo C. Dantas IFTM

DOI: https://doi.org/10.5753/ise.2025.14870

Resumo

The appropriate naming of identifiers is crucial in the source code, as names can represent up to 70% of all characters and play a key role in helping developers understand the intent behind the code. Renaming is also among the most common refactoring operations performed by developers. Although static analysis tools can detect violations of naming conventions, there remains a gap in tools capable of suggesting names that are semantically aligned with the purpose of the code. In this study, we collected 152 Java and 168 Python instances of GitHub pull requests (PRs) that improved the identifier names. We investigated whether Large Language Models (LLMs) are capable of suggesting names that are semantically consistent with those chosen by developers. We also compared three temperature settings, ranging from deterministic to creative, to assess how this parameter influences naming suggestions. Our results show that ChatGPT slightly outperforms the other models for Python identifiers, exactly matching the developers’ chosen names in 16.2% of the cases and achieving high semantic similarity in 65.1%. Overall, the three LLMs performed better in Python than in Java. Regarding the temperature setting, the moderate temperature yielded slightly better results in Python, while the creative setting performed slightly better in Java. These findings suggest that tasks such as naming identifiers may benefit from different parameter configurations than those typically used in more deterministic tasks such as code refinement.

Palavras-chave: Code identifiers, LLM, ChatGPT, DeepSeek, Gemini, Pull Request

Referências

Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (Hong Kong, China) (FSE 2014). Association for Computing Machinery, New York, NY, USA, 281–293.

Shatha Alsaedi, Asaad Ahmed, Amin Noaman, and Fathy Eassa. 2024. Two-Level Information-Retrieval-Based Model for Bug Localization Based on Bug Reports. Electronics 13 (01 2024), 321.

C. C. Dantas, A. M. Rocha, and M. A. Maia. 2023. How do Developers Improve Code Readability? An Empirical Study of Pull Requests. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 110–122.

Carlos E. Dantas, Adriano M. Rocha, and Marcelo A. Maia. 2023. Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study. In Proceedings of the XXXVII Brazilian Symposium on Software Engineering (SBES ’23). Association for Computing Machinery, New York, NY, USA, 283–292.

DEEPSEEK. 2025. Deep Seek Coder Model. [link]. Accessed on June 10, 2025.

F. Deissenbock and M. Pizka. 2005. Concise and consistent naming [software system identifier naming]. In 13th InternationalWorkshop on Program Comprehension (IWPC’05). 97–106.

GITHUB. 2025. GraphQl V4. [link]. Accessed: 2025-02-20.

Qi Guo, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, and Xin Peng. 2024. Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study. In Proceedings of the IEEE/ACM 46th (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery.

Julyanara R. Silva and Marcelo A. Maia and Carlos Eduardo C. Dantas. 2025. Do LLMs Suggest Consistent Identifiers? An Empirical Study on GitHub Pull Requests. DOI: 10.5281/zenodo.15872389. Accessed: 2025-07-13.

Bin Lin, Simone Scalabrino, Andrea Mocci, Rocco Oliveto, Gabriele Bavota, and Michele Lanza. 2017. Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers. 81–90.

Robert C. Martin. 2008. Clean Code: A Handbook of Agile Software Craftsmanship. Prentice Hall, Upper Saddle River, NJ.

Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an LLM to Help With Code Understanding. In Proceedings of the IEEE/ACM 46th (Lisbon, Portugal) (ICSE ’24).

Delano Oliveira, Reydne Santos, Benedito de Oliveira, Martin Monperrus, Fernando Castor, and Fernanda Madeiral. 2025. Understanding Code Understandability Improvements in Code Reviews. IEEE Transactions on Software Engineering 51, 1 (2025), 14–37.

Jevgenija Pantiuchina, Fiorella Zampetti, Simone Scalabrino, Valentina Piantadosi, Rocco Oliveto, Gabriele Bavota, and Massimiliano Di Penta. 2020. Why Developers Refactor Source Code: A Mining-based Study. ACM Trans. Softw. Eng. Methodol. 29, 4, Article 29 (Sept. 2020), 30 pages.

Nils Reimers. 2023. SentenceTransformer. [link]. Accessed: 2025-05-11.

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982–3992. [link]

Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How developers search for code: a case study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (Bergamo, Italy) (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 191–201.

Julyanara Silva, Carlos Dantas, and Marcelo Maia. 2024. What Developers Ask to ChatGPT in GitHub Pull Requests? an Exploratory Study. In Anais do XII Workshop de Visualização, Evolução e Manutenção de Software (Curitiba/PR). SBC, Porto Alegre, RS, Brasil, 125–136.

Dominik Sobania, Martin Briesch, Carol Hanna, and Justyna Petke. 2023. An Analysis of the Automatic Bug Fixing Performance of ChatGPT . In 2023 IEEE/ACM International Workshop on Automated Program Repair (APR). IEEE Computer Society, Los Alamitos, CA, USA, 23–30.

SONARLINT. 2025. SonarlintWeb Page. [link]. Accessed: 2025-02-20.

Andreas Thies and Christian Roth. 2010. Recommending rename refactorings. In Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering (Cape Town, South Africa) (RSSE ’10). Association for Computing Machinery, New York, NY, USA, 1–5.

Rosalia Tufano, Antonio Mastropaolo, Federica Pepe, Ozren Dabić, Massimiliano Di Penta, and Gabriele Bavota. 2024. Unveiling ChatGPT’s Usage in Open Source Projects: A Mining-based Study.

Zejun Zhang, Zhenchang Xing, Dehai Zhao, Xiwei Xu, Liming Zhu, and Qinghua Lu. 2024. Automated Refactoring of Non-Idiomatic Python Code With Pythonic Idioms. IEEE Transactions on Software Engineering PP (11 2024), 1–22.