Detecting Code Smells in JavaScript: An Annotated Dataset for Software Quality Analysis

Resumo


The source code quality level attained during the development phase is an important factor in increasing costs in later stages of software development. Among the most detrimental quality problems are code smells, which are violations of both programming principles and good practices that negatively affect the maintainability and evolution of computer programs. Much effort has been put into creating tools for code smell detection over the last decades. A promising approach relies on machine learning (ML) algorithms for automated smell detection. Those algorithms usually need datasets with labeled instances pointing to the presence/absence of smells in programming constructs such as classes and methods. Despite a good number of studies using ML for code smell detection, there is a lack of studies adopting this approach for programming languages other than Java. Even widely popular languages like JavaScript have few or no studies covering the usage of ML models for smell detection despite lexical, structural, and paradigm differences when compared to Java. A symptom of the lack of such studies in JavaScript is the absence of standard code smell datasets for this language in the literature. This work presents a new dataset for code smell detection in JavaScript software focused on detecting God Class and Long Method, two of the most prevalent and harmful code smells. We describe the strategy used for the dataset construction, its characteristics, and a few preliminary experiments using our dataset, along with ML models for code smell detection.

Palavras-chave: dataset, code smells, JavaScript, machine learning, classification

Referências

Marwen Abbes, Foutse Khomh, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2011. An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In 2011 15th european conference on software maintenance and reengineering. IEEE, 181–190.

Nabil Almashfi and Lunjin Lu. 2020. Code smell detection tool for Java Script programs. In 2020 5th International Conference on Computer and Communication Systems (ICCCS). IEEE, 172–176.

Nuno Antunes and Marco Vieira. 2015. On the metrics for benchmarking vulnerability detection tools. In 2015 45th Annual IEEE/IFIP international conference on dependable systems and networks. IEEE, 505–516.

Francesca Arcelli Fontana and Marco Zanoni. 2017. Code smell severity classification using machine learning techniques. Knowledge-Based Systems 128 (2017), 43–58. DOI: 10.1016/j.knosys.2017.04.014

Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115–138.

Rajiv D Banker, SrikantMDatar, Chris F Kemerer, and Dani Zweig. 1993. Software complexity and maintenance costs. Commun. ACM 36, 11 (1993), 81–95.

Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (01 Oct 2001), 5–32. DOI: 10.1023/A:1010933404324

William W. Cohen. 1995. Fast Effective Rule Induction. In Machine Learning Proceedings 1995, Armand Prieditis and Stuart Russell (Eds.). Morgan Kaufmann, San Francisco (CA), 115–123. DOI: 10.1016/B978-1-55860-377-6.50023-2

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (01 Sep 1995), 273–297. DOI: 10.1007/BF00994018

Dario Di Nucci, Fabio Palomba, Damian A. Tamburri, Alexander Serebrenik, and Andrea De Lucia. 2018. Detecting code smells using machine learning techniques: Are we there yet?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 612–621. DOI: 10.1109/SANER.2018.8330266

Amin Milani Fard and Ali Mesbah. 2013. Jsnose: Detecting Javascript code smells. In 2013 IEEE 13th international working conference on Source Code Analysis and Manipulation (SCAM). IEEE, 116–125.

Francesca Arcelli Fontana, Pietro Braione, and Marco Zanoni. 2012. Automatic detection of bad smells in code: An experimental assessment. J. Object Technol. 11, 2 (2012), 5–1.

Francesca Arcelli Fontana, Mika V Mäntylä, Marco Zanoni, and Alessandro Marino. 2016. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering 21, 3 (2016), 1143–1191.

Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V Mäntylä. 2013. Code smell detection: Towards a machine learning-based approach. In 2013 IEEE international conference on software maintenance. IEEE, 396–399.

Martin Fowler. 2018. Refactoring: improving the design of existing code. Addison-Wesley Professional.

Thirupathi Guggulothu and Salman Abdul Moiz. 2020. Code smell detection using multi-label classification approach. Software Quality Journal 28, 3 (01 Sep 2020), 1063–1086. DOI: 10.1007/s11219-020-09498-y

Foutse Khomh, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2012. An exploratory study of the impact of antipatterns on class change-and fault-proneness. Empirical Software Engineering 17, 3 (2012), 243–275.

Valentina Lenarduzzi, Nyyti Saarimäki, and Davide Taibi. 2019. The Technical Debt Dataset. In Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering (Recife, Brazil) (PROMISE’19). Association for Computing Machinery, New York, NY, USA, 2–11. DOI: 10.1145/3345629.3345630

Lech Madeyski and Tomasz Lewowski. 2020. MLCQ: Industry-Relevant Code Smell Data Set. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering (Trondheim, Norway) (EASE ’20). Association for Computing Machinery, New York, NY, USA, 342–347. DOI: 10.1145/3383219.3383264

Abdou Maiga, Nasir Ali, Neelesh Bhattacharya, Aminata Sabane, Yann-Gaël Guéhéneuc, and Esma Aimeur. 2012. Smurf: A svm-based incremental antipattern detection approach. In 2012 19th Working Conference on Reverse Engineering. IEEE, 466–475.

Mika V. Mäntylä and Casper Lassenius. 2006. Subjective evaluation of software evolvability using code smells: An empirical study. Empirical Software Engineering 11, 3 (01 Sep 2006), 395–431. DOI: 10.1007/s10664-006-9002-8

Mohammad Mhawish and Manjari Gupta. 2019. Generating Code-Smell Prediction Rules Using Decision Tree Algorithm and Software Metrics. International Journal of Computer Sciences and Engineering 7 (05 2019), 41–48. DOI: 10.26438/ijcse/v7i5.4148

M.V. Mäntylä, J. Vanhanen, and C. Lassenius. 2004. Bad smells - humans as code critics. In 20th IEEE International Conference on Software Maintenance, 2004. Proceedings. 399–408. DOI: 10.1109/ICSM.2004.1357825

Niels Groot Obbink, Ivano Malavolta, Gian Luca Scoccia, and Patricia Lago. 2018. An extensible approach for taming the challenges of JavaScript dead code elimination. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 291–401.

Steffen M. Olbrich, Daniela S. Cruzes, and Dag I.K. Sjøberg. 2010. Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems. In 2010 IEEE International Conference on Software Maintenance. 1–10. DOI: 10.1109/ICSM.2010.5609564

David Lorge Parnas. 1994. Software aging. In Proceedings of 16th International Conference on Software Engineering. IEEE, 279–287.

Damian A Tamburri, Fabio Palomba, Alexander Serebrenik, and Andy Zaidman. 2019. Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering 24, 3 (2019), 1369–1417.

Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies. In 2010 Asia Pacific Software Engineering Conference. 336–345. DOI: 10.1109/APSEC.2010.46

XiaoyinWang, Yingnong Dang, Lu Zhang, Dongmei Zhang, Erica Lan, and Hong Mei. 2012. Can I clone this piece of code here?. In 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 170–179. DOI: 10.1145/2351676.2351701

Aiko Yamashita and Leon Moonen. 2013. Exploring the impact of inter-smell relations on software maintainability: An empirical study. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 682–691.

Jiachen Yang, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, and Shinji Kusumoto. 2015. Classification model for code clones based on machine learning. Empirical Software Engineering 20, 4 (01 Aug 2015), 1095–1125. DOI: 10.1007/s10664-014-9316-x

Min Zhang, Tracy Hall, and Nathan Baddoo. 2011. Code Bad Smells: a review of current knowledge. J. Softw. Maint. Evol. 23, 3 (apr 2011), 179–202. DOI: 10.1002/smr.521
Publicado
30/09/2024
SARAFIM, Diego S.; DELGADO, Karina V.; CORDEIRO, Daniel. Detecting Code Smells in JavaScript: An Annotated Dataset for Software Quality Analysis. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 38. , 2024, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 313-322. DOI: https://doi.org/10.5753/sbes.2024.3432.