ImprovMLCQ: A Feature-Enriched Dataset for Advancing Code Smell Detection

Joanne Carneiro; Jessica Ribas; Amanda Santana; Eduardo Figueiredo; Juliana Alves Pereira

doi:10.5753/sbcars.2025.13594

Joanne Carneiro PUC-Rio
Jessica Ribas PUC-Rio https://orcid.org/0000-0002-9294-9533
Amanda Santana UFMG
Eduardo Figueiredo UFMG
Juliana Alves Pereira PUC-Rio https://orcid.org/0000-0002-0799-2829

DOI: https://doi.org/10.5753/sbcars.2025.13594

Resumo

Code smells are indicators of poor design choices in source code that negatively impact software quality. While manual detection of code smells is time-consuming, their automated detection requires high-quality datasets. This work evaluates an improved version of the dataset Madeyski Lewowski Code Quest (MLCQ), called ImprovMLCQ, which incorporates an extensive list of features extracted with four tools: CK, PMD, Organic, and Designite; along with several project characteristics. Our goal is to leverage these features to gain deeper insights into the detection or four code smells (Long Method, Feature Envy, Data Class, and Blob), assessing the effectiveness of different Machine Learning (ML) and Deep Learning (DL) models, and exploring the impact of feature selection on predictive performance. We evaluate fifteen ML algorithms and four DL algorithms using ImprovMLCQ, leveraging various feature engineering and selection mechanisms to optimize predictive performance. Our results show that the enriched dataset significantly boosts the performance of ML and DL models.

Palavras-chave: Code Smell, Software Quality, Maintainability, Machine Learning

Referências

Amal Alazba and Hamoud Aljamaan. 2021. Code smell detection using feature selection and stacking ensemble: An empirical investigation. Information and Software Technology 138 (2021), 106648.

Lucas Amorim, Evandro Costa, Nuno Antunes, Baldoino Fonseca, and Márcio Ribeiro. 2015. Experience report: Evaluating the effectiveness of decision trees for detecting code smells. In 2015 IEEE 26th international symposium on software reliability engineering (ISSRE). IEEE, 261–269.

Maurício Aniche. 2015. Java code metrics calculator (CK). Available in [link]. Accessed March 20, 2025..

Géron Aurélien. 2019. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. Concepts, tools, and techniques to build intelligent systems, 2nd ednn (2019).

Suresh Balakrishnama and Aravind Ganapathiraju. 1998. Linear discriminant analysis-a brief tutorial. Institute for Signal and information Processing 18, 1998 (1998), 1–8.

Joanne Carneiro, Jessica Barbara da Silva Ribas, Amanda Santana, Juliana Alves Pereira, and Eduardo. Figueiredo. 2025. ImprovMLCQ: Improving the Madeyski Lewowski Code Quest Dataset. [link]. Accessed March 24, 2025.

Diego Cedrim and Leonardo Sousa. 2017. Organic. Available in [link]. Accessed February 5, 2025..

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

Junliang Fan, Xin Ma, Lifeng Wu, Fucang Zhang, Xiang Yu, and Wenzhi Zeng. 2019. Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agricultural water management 225 (2019), 105758.

Marting Fowler. 1999. Refactoring: Improving the design of existing code addisonwesley professional. Berkeley, CA, USA (1999).

Martin Fowler. 2018. Refactoring: improving the design of existing code. Addison-Wesley Professional.

Yoav Freund and Robert E Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55, 1 (1997), 119–139.

Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.

Carl Friedrich Gauss. 1877. Theoria motus corporum coelestium in sectionibus conicis solem ambientium. Vol. 7. FA Perthes.

Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine learning 63 (2006), 3–42.

Arthur E Hoerl and RobertWKennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 (1970), 55–67.

Wenhua Hu, Lei Liu, Peixin Yang, Kuan Zou, Jiajun Li, Guancheng Lin, and Jianwen Xiang. 2023. Revisiting" code smell severity classification using machine learning techniques". In 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 840–849.

Foutse Khomh, Stéphane Vaucher, Yann-Gaël Guéhéneuc, and Houari Sahraoui. 2009. A bayesian approach for the detection of code and design smells. In 2009 Ninth International Conference on Quality Software. IEEE, 305–314.

Hui Liu, Jiahao Jin, Zhifeng Xu, Yanzhen Zou, Yifan Bu, and Lu Zhang. 2019. Deep learning based code smell detection. IEEE transactions on Software Engineering 47, 9 (2019), 1811–1837.

Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory 28, 2 (1982), 129–137.

Lech Madeyski and Tomasz Lewowski. 2020. MLCQ: Industry-relevant code smell data set. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering. 342–347.

Lech Madeyski and Tomasz Lewowski. 2023. Detecting code smells using industry-relevant data. Information and Software Technology 155 (2023), 107112.

Abdou Maiga, Nasir Ali, Neelesh Bhattacharya, Aminata Sabané, Yann-Gaël Guéhéneuc, Giuliano Antoniol, and Esma Aimeur. 2012. Support vector machines for anti-pattern detection. In Proceedings of the 27th IEEE/ACM international conference on automated software engineering. 278–281.

Vinícius Martins, Pedro Lopes Verardo Ramos, Breno Braga Neves, Maria Vitoria Lima, Johny Arriel, João Victor Godinho, Joanne Ribeiro, Alessandro Garcia, and Juliana Alves Pereira. 2024. Eyes on Code Smells: Analyzing Developers’ Responses During Code Snippet Analysis. In Simpósio Brasileiro de Engenharia de Software (SBES). SBC, 302–312.

Naouel Moha, Yann-Gaël Guéhéneuc, Laurence Duchien, and Anne-Francoise Le Meur. 2009. Decor: A method for the specification and detection of code and design smells. IEEE Transactions on Software Engineering 36, 1 (2009), 20–36.

Himesh Nanadani, Mootez Saad, and Tushar Sharma. 2023. Calibrating deep learning-based code smell detection using human feedback. In 2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 37–48.

Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Fausto Fasano, Rocco Oliveto, and Andrea De Lucia. 2018. On the Diffuseness and the Impact on Maintainability of Code Smells: A Large Scale Empirical Investigation. In Proceedings of the 40th International Conference on Software Engineering. 482–482.

Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrea De Lucia. 2014. Do They Really Smell Bad? A Study on Developers’ Perception of Bad Code Smells. In IEEE International Conference on Software Maintenance and Evolution. 101–110.

Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Denys Poshyvanyk, and Andrea De Lucia. 2014. Mining version histories for detecting code smells. Transactions on Software Engineering 41, 5 (2014), 462–489.

Fabio Palomba, Annibale Panichella, Andrea De Lucia, Rocco Oliveto, and Andy Zaidman. 2016. A textual-based technique for smell detection. In 2016 IEEE 24th international conference on program comprehension (ICPC). IEEE, 1–10.

PMD. 2025. PMD Source Code Analyzer. Available in [link]. Accessed March 12, 2025..

Yingli Qin. 2018. Areviewof quadratic discriminant analysis for high-dimensional data. Wiley Interdisciplinary Reviews: Computational Statistics 10, 4 (2018), e1434.

Amanda Santana, Eduardo Figueiredo, and Juliana Alves Pereira. 2024. Unraveling the Impact of Code Smell Agglomerations on Code Stability. In International Conference on Software Maintenance and Evolution (ICSME). 461–473.

Geanderson Santos, Amanda Santana, Gustavo Vale, and Eduardo Figueiredo. 2023. Yet Another Model! A Study on Model’s Similarities for Defect and Code Smells. In Fundamental Approaches to Software Engineering. Springer Nature, 282–305.

Tushar Sharma, Pratibha Mishra, and Rohit Tiwari. 2016. Designite: A software design quality assessment tool. In Proceedings of the 1st international workshop on bringing architectural design thinking into developers’ daily activities. 1–4.

Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The qualitas corpus: A curated collection of java code for empirical studies. In 2010 Asia pacific software engineering conference. IEEE, 336–345.

Guilherme Travassos, Forrest Shull, Michael Fredericks, and Victor R Basili. 1999. Detecting defects in object-oriented designs: using reading techniques to increase software quality. ACM sigplan notices 34, 10 (1999), 47–56.

Claes Wholin, Per Runeson, Martin Host, Magnus C Ohlsson, Björn Regnell, and AndersWesslén. 2000. Experimentation in software engineering: an introduction. 274 pages.

Stewart W Wilson. 2002. Classifiers that approximate functions. Natural Computing 1, 2 (2002), 211–234.

Aiko Yamashita and Steve Counsell. 2013. Code smells as system-level indicators of maintainability: An empirical study. Journal of Systems and Software 86, 10 (2013), 2639–2653.

Dongwen Zhang, Shuai Song, Yang Zhang, Haiyang Liu, and Gaojie Shen. 2023. Code Smell Detection Research Based on Pre-training and Stacking Models. Latin America Transactions 22, 1 (2023), 22–30.

Min Zhang, Tracy Hall, and Nathan Baddoo. 2011. Code bad smells: a review of current knowledge. Journal of Software Maintenance and Evolution: research and practice 23, 3 (2011), 179–202.