Low-Cost Machine Learning for Effective and Efficient Bad Smells Detection

J. S. L. Figuerêdo; V. T. Sarinho; R. T. Calumby

doi:10.5753/kdmile.2021.17468

J. S. L. Figuerêdo Universidade Estadual de Feira de Santana (UEFS) http://orcid.org/0000-0003-1892-3455
V. T. Sarinho Universidade Estadual de Feira de Santana (UEFS)
R. T. Calumby Universidade Estadual de Feira de Santana (UEFS) http://orcid.org/0000-0001-8515-265X

DOI: https://doi.org/10.5753/kdmile.2021.17468

Resumo

Bad smells are characteristics of software that indicate a code or design problem which can make information system hard to understand, evolve, and maintain. To address this problem, different approaches, manual and automated, have been proposed over the years, including more recently machine learning alternatives. However, despite the advances achieved, some machine learning techniques have not yet been effectively explored, such as the use of feature selection techniques. Moreover, it is not clear to what extent the use of numerous source-code features are necessary for reasonable bad smell detection success. Therefore, in this work we propose an approach using low-cost machine learning for effective and efficient detection of bad smells, through explicit feature selection. Our results showed that the selection allowed to statistically improve the effectiveness of the models. For some cases, the models achieved statistical equivalence, but relying on a highly reduced set of features. Indeed, by using explicit feature selection, simpler models such as Naive Bayes became statistically equivalent to robust models such as Random Forest. Therefore, the selection of features allowed keeping competitive or even superior effectiveness while also improving the efficiency of the models, demanding less computational resources for source-code preprocessing, model training and bad smell detection.

Palavras-chave: bad smell, machine learning, feature selection, data mining

Referências

Al-Shaaby, A., Aljamaan, H., and Alshayeb, M. Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review. Arabian Journal for Science and Engineering 45 (4): 2341–2369, apr, 2020. DOI: 10.1007/s13369-019-04311-w

Arcelli Fontana, F., Mäntylä, M. V., Zanoni, M., and Marino, A. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering 21 (3): 1143–1191, jun, 2016. DOI: 10.1007/s10664-015-9378-4

Azeem, M. I., Palomba, F., Shi, L., and Wang, Q. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Inf. Softw. Technol. vol. 108, pp. 115–138, 2019. DOI: 10.1016/j.infsof.2018.12.009

Booch, G., Maksimchuk, R. A., Engel, M. W., Bobbi J. Young, J. C., and Houston, K. A. Object-oriented analysis and design with applications. The Addison-Wesley object technology series. Addison-Wesley, 2007.

Brownlee, J. Machine learning mastery with python. Machine Learning Mastery Pty Ltd, 2016.

Cruz, D., Santana, A., and Figueiredo, E. Detecting bad smells with machine learning algorithms: an empirical study. In TechDebt ’20: International Conference on Technical Debt, Seoul, Republic of Korea, June 28-30, 2020, C. Izurieta, M. Galster, and M. Felderer (Eds.). ACM, pp. 31–40, 2020. DOI: 10.1145/3387906.3388618

Cunningham, W. The wycash portfolio management system. In Addendum to the Proceedings on Object-Oriented Programming Systems, Languages, and Applications (Addendum). OOPSLA ’92. Association for Computing Machinery, New York, NY, USA, pp. 29–30, 1992. DOI: 10.1145/157710.157715

Danphitsanuphan, P. and Suwantada, T. Code smell detecting tool and code smell-structure bug relationship. In 2012 Spring Congress on Engineering and Technology. IEEE, pp. 1–5, 2012. DOI: 10.1109/SCET.2012.6342082

Di Nucci, D., Palomba, F., Tamburri, D. A., Serebrenik, A., and De Lucia, A. Detecting code smells using machine learning techniques: Are we there yet? In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, pp. 612–621, 2018. DOI: 10.1109/SANER.2018.8330266

Guggulothu, T. and Moiz, S. A. Code smell detection using multi-label classification approach. Software Quality Journal 28 (3): 1063–1086, sep, 2020. DOI: 10.1007/s11219-020-09498-y

Han, J., Kamber, M., and Pei, J. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011.

Kreimer, J. Adaptive detection of design flaws. Electronic Notes in Theoretical Computer Science 141 (4): 117 – 136, 2005. Proceedings of the Fifth Workshop on Language Descriptions, Tools, and Applications (LDTA 2005). DOI: 10.1016/j.entcs.2005.02.059

Kruchten, P., Nord, R. L., and Ozkaya, I. Technical debt: From metaphor to theory and practice. IEEE Softw. 29 (6): 18–21, Nov., 2012. DOI: 10.1109/MS.2012.167

Lehman, M. M. Programs, life cycles, and laws of software evolution. Proc. of the IEEE 68 (9): 1060–1076, 1980. DOI: 10.1109/PROC.1980.11805

Liu, H., Guo, X., and Shao, W. Monitor-based instant software refactoring. IEEE TSE 39 (8): 1112–1126, 2013. DOI: 10.1109/tse.2013.4

Martin Fowler, K. B. Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999.

Moha, N., Gueheneuc, Y.-G., Duchien, L., and Le Meur, A.-F. Decor: A method for the specification and detection of code and design smells. IEEE Trans. Softw. Eng. 36 (1): 20–36, Jan., 2010.

Mori, A., Figueiredo, E., and Cirilo, E. Towards the definition of domain-specific thresholds. In Anais do XIII Simpósio Brasileiro de Sistemas de Informação. SBC, Porto Alegre, RS, Brasil, pp. 404–411, 2017. DOI: 10.5753/sbsi.2017.6069

Palomba, F., Bavota, G., Penta, M., Oliveto, R., Poshyvanyk, D., and Lucia, A. D. Mining version histories for detecting code smells. IEEE TSE 41 (05): 462–489, may, 2015. DOI: 10.1109/TSE.2014.2372760

Roy, R., Stark, R., Tracht, K., Takata, S., and Mori, M. Continuous maintenance and the future – foundations and technological challenges. CIRP Annals 65 (2): 667 – 688, 2016. DOI: 10.1016/j.cirp.2016.06.006

Sjøberg, D. I. K., Yamashita, A., Anda, B. C. D., Mockus, A., and Dybå, T. Quantifying the effect of code smells on maintenance effort. IEEE TSE 39 (8): 1144–1156, 2013. DOI: 10.1109/TSE.2012.89

Tempero, E., Anslow, C., Dietrich, J., Han, T., Li, J., Lumpe, M., Melton, H., and Noble, J. The qualitas corpus: A curated collection of java code for empirical studies. In APSEC. pp. 336–345, 2010. DOI: 10.1109/APSEC.2010.46

Yamashita, A. and Moonen, L. Do code smells reflect important maintainability aspects? In 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp. 306–315, 2012. DOI: 10.1109/ICSM.2012.6405287