Say my name! An empirical study on the pronounceability of identifier names

Remo Gresta; Elder Cirilo

doi:10.5753/vem.2021.17218

Remo Gresta UFSJ
Elder Cirilo UFSJ

DOI: https://doi.org/10.5753/vem.2021.17218

Resumo

Identifiers represent approximately 2/3 of the elements in source code, and their names directly impact code comprehension. Indeed, intention-revealing names make code easier to understand, especially in code review sessions, where developers examine each other's code for mistakes. However, we argue that names should be understandable and pronounceable to enable developers to review and discuss code effectively. Therefore, we carried out an empirical study based on 40 open-source projects to explore the naming practices of developers concerning word complexity and pronounceability. We applied the Word Complexity Measure (WCM) to discover complex names; and analyzed the phonetic similarity among names and hard-to-pronounce English words. As a result, we observed that most of the analyzed names are somewhat composed of hard-to-pronounce words. The overall word complexity score of the projects also tends to be significant. Finally, the results show that the code location impacts the word complexity: names in small scopes tend to be simpler than names declared in large scopes.

Palavras-chave: Identifier names, Phonetic Algorithm, Pronounceable names

Referências

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In International Symposium on Foundations of Software Engineering.

Eran Avidan and Dror G Feitelson. 2017. Effects of variable names on comprehension: An empirical study. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 55–65.

Grady Booch, Douglas L Bryan, and Charles G Petersen. 1994. Software engineering with Ada. Vol. 30608. Addison-Wesley Professional.

Leonid Boytsov. 2011. Indexing Methods for Approximate Dictionary Searching: Comparative Analysis. 16 (2011).

Ruven Brooks. 1983. Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies 18, 6 (1983), 543–554.

Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2010. Exploring the influence of identifier names on code quality: An empirical study. In 2010 14th European Conference on Software Maintenance and Reengineering. IEEE, 156–165.

C Caprile and Paolo Tonella. 1999. Nomen est omen: Analyzing the language of function identifiers. In Sixth Working Conference on Reverse Engineering (Cat. No. PR00303). IEEE, 112–122.

Michael L Collard, Michael John Decker, and Jonathan I Maletic. 2013. srcml: An infrastructure for the exploration, analysis, and manipulation of source code: A tool demonstration. In 2013 IEEE International Conference on Software Maintenance. IEEE, 516–519.

Florian Deissenboeck and Markus Pizka. 2006. Concise and consistent naming. Software Quality Journal 14, 3 (2006), 261–282.

Nicolas Gold and Keith Bennett. 2004. Program comprehension for web services. In Proceedings. 12th IEEE InternationalWorkshop on Program Comprehension, 2004. IEEE, 151–160.

Johannes Hofmeister, Janet Siegmund, and Daniel V Holt. 2017. Shorter identifier names take longer to comprehend. In 2017 IEEE 24th International conference on software analysis, evolution and reengineering (SANER). IEEE, 217–227.

Einar W Host and Bjarte M Ostvold. 2007. The programmer’s lexicon, volume I: The verbs. In International Working Conference on Source Code Analysis and Manipulation.

Deepjot Kaur and Navjot Kaur. 2013. A review: An efficient review of phonetics algorithms. International Journal of Computer Science & EngineeringTechnology 4, 5 (2013), 5068.

Kimiaki Kawamoto and Osamu Mizuno. 2012. Predicting fault-prone modules using the length of identifiers. In 2012 Fourth InternationalWorkshop on Empirical Software Engineering in Practice. IEEE, 30–34.

Dawn Lawrie, Henry Feild, and David Binkley. 2007. Quantifying identifier quality: an analysis of trends. Empirical Software Engineering 12, 4 (2007), 359– 388.

Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2006. What’s in a Name? A Study of Identifiers. In 14th IEEE International Conference on Program Comprehension (ICPC’06). IEEE, 3–12.

Robert C Martin. 2008. Clean Code: A Handbook of Agile Software Craftsmanship.( 2008). Citado na (2008), 19.

Lawrence Philips. 2000. The Double Metaphone Search Algorithm. C/C++ Users J. 18, 6 (June 2000), 38–43.

David Pinto, Darnes Vilariño, Yuridiana Alemán, Helena Gómez, Nahun Loya, and Héctor Jiménez-Salazar. 2012. The Soundex phonetic algorithm revisited for SMS text representation. In International Conference on Text, Speech and Dialogue. Springer, 47–55.

Václav Rajlich and Norman Wilde. 2002. The role of concepts in program comprehension. In Proceedings 10th International Workshop on Program Comprehension. IEEE, 271–278.

Chakkrit Snae. 2007. A comparison and analysis of name matching algorithms. International Journal of Applied Science. Engineering and Technology 4, 1 (2007), 252–257.

Carol Stoel-Gammon. 2010. The Word Complexity Measure: Description and application to developmental phonology and disorders. Clinical linguistics & phonetics 24, 4-5 (2010), 271–282.