Estudo empírico: detecção de Code Smells com aprendizado de máquinas

Raimundo Alan Freire Moreira; Lucas José Lemos Braz; Fischer Jônatas Ferreira; Márcio André Baima Amora

doi:10.5753/cibse.2024.28455

Raimundo Alan Freire Moreira UFC
Lucas José Lemos Braz UFC
Fischer Jônatas Ferreira UNIFEI
Márcio André Baima Amora UFC

DOI: https://doi.org/10.5753/cibse.2024.28455

Resumo

A detecção de code smells durante o processo de desenvolvimento de software é importante para melhorar a qualidade do software e a refatoração é fundamental para eliminar esses indícios de problema. Este estudo avalia uma abordagem empírica que se baseia no treinamento de cinco algoritmos de aprendizado de máquina para detectar code smells em sistemas de software, utilizando métricas de software como parâmetros. Os resultados mostram que a abordagem de aprendizado de máquina tive um excelente desempenho para a detecção de code smells, alcançando uma acurácia entre 93,7% a 99,2%.

Referências

Abdou, A. and Ramadan, N. (2022). Selected code-quality characteristics and metrics for internet of things systems. Journal of Software: Evolution and Process, 34:18.

Al Shalabi, L., Shaaban, Z., and Kasasbeh, B. (2006). Data mining: A preprocessing engine. Journal of Computer Science, 2(9):735–739.

Arlot, S. and Celisse, A. (2010). A survey of cross-validation procedures for model selection.

Ashraf Abdou, N. R. (2022). Severity classification of software code smells using machine learning techniques: A comparative study. Journal of Software: Evolution and Process, page 37.

Breiman, L. (2001). Random forests. Machine learning, 45:5–32. Brown, W. H., Malveau, R. C., McCormick, H. W. S., and Mowbray, T. J. (1998). AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis. John Wiley & Sons, Inc., USA, 1st edition.

Catal, C. (2012). Performance evaluation metrics for software fault prediction studies. Acta Polytechnica Hungarica, 9(4):193–206.

Ciupke, O. (1999). Automatic detection of design problems in object-oriented reengineering. In Proceedings of technology of object-oriented languages and systems-TOOLS 30 (Cat. No. PR00278), pages 18–32. IEEE.

Dewangan, S., Rao, R. S., Mishra, A., and Gupta, M. (2021). A novel approach for code smell detection: An empirical study. IEEE Access, 9:162869–162883.

Di Nucci, D., Palomba, F., Tamburri, D. A., Serebrenik, A., and De Lucia, A. (2018). Detecting code smells using machine learning techniques: are we there yet? In 2018 ieee 25th international conference on software analysis, evolution and reengineering (saner), pages 612–621. IEEE.

Fontana, F. A., Ferme, V., Zanoni, M., and Roveda, R. (2015). Towards a prioritization of code debt: A code smell intensity index. In 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD), pages 16–24. IEEE.

Fontana, F. A., Mäntylä, M., Zanoni, M., and Marino, A. (2016). Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering, 21:1143–1191.

Fontana, F. A. and Zanoni, M. (2017). Code smell severity classification using machine learning techniques. Knowledge-Based Systems, 128:43–58.

Fowler, M. (2018). Refactoring: Improving the Design of Existing Code. Addison-Wesley, 2nd edition.

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189–1232.

Han, J., Pei, J., and Tong, H. (2022). Data mining: concepts and techniques. Morgan kaufmann.

Hossin, M. and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2):1.

Kaur, I. and Kaur, A. (2021). A novel four-way approach designed with ensemble feature selection for code smell detection. IEEE Access, 9:8695–8707.

Lanza, M. and Marinescu, R. (2007). Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media.

McConnell, S. (2004). Code complete : a practical handbook of sofware construction. Microsoft Press.

Mhawish, M. Y. and Gupta, M. (2020). Predicting code smells and analysis of predictions: Using machine learning techniques and software metrics. Journal of Computer Science and Technology, 35:1428–1445.

Moha, N., Guéhéneuc, Y.-G., Duchien, L., and Le Meur, A.-F. (2009). Decor: A method for the specification and detection of code and design smells. IEEE Transactions on Software Engineering, 36(1):20–36.

Noble, W. S. (2006). What is a support vector machine? Nature biotechnology, 24(12):1565–1567.

Palomba, F., Di Nucci, D., Tufano, M., Bavota, G., Oliveto, R., Poshyvanyk, D., and De Lucia, A. (2015). Landfill: An open dataset of code smells with public evaluation. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pages 482–485. IEEE.

Pushpalatha, M. and Mrunalini, M. (2021). Predicting the severity of open source bug reports using unsupervised and supervised techniques. In Research Anthology on Usage and Development of Open Source Software, pages 676–692. IGI Global.

Quinlan, J. R. (1990). Decision trees and decision-making. IEEE Transactions on Systems, Man, and Cybernetics, 20(2):339–346.

Ray, S. (2019). A quick review of machine learning algorithms. In 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), pages 35–39. IEEE.

Riel, A. J. (1996). Object-Oriented Design Heuristics. Addison-Wesley Longman Publishing Co., Inc., USA, 1st edition.

Ruck, D. W., Rogers, S. K., and Kabrisky, M. (1990). Feature selection using a multilayer perceptron. Journal of neural network computing, 2(2):40–48.

Sahin, D., Kessentini, M., Bechikh, S., and Deb, K. (2014). Code-smell detection as a bilevel problem. ACM Transactions on Software Engineering and Methodology (TOSEM), 24(1):1–44.

Tarwani, S. and Chug, A. (2016). Predicting maintainability of open source software using gene expression programming and bad smells. In 2016 5th international conference on reliability, Infocom technologies and optimization (trends and future directions)(ICRITO), pages 452–459. IEEE.

Tempero, E., Anslow, C., Dietrich, J., Han, T., Li, J., Lumpe, M., Melton, H., and Noble, J. (2010). The qualitas corpus: A curated collection of java code for empirical studies. In 2010 Asia Pacific Software Engineering Conference, pages 336–345.

Travassos, G., Shull, F., Fredericks, M., and Basili, V. R. (1999). Detecting defects in object-oriented designs: using reading techniques to increase software quality. ACM sigplan notices, 34(10):47–56.

Tsantalis, N. and Chatzigeorgiou, A. (2009). Identification of move method refactoring opportunities. IEEE Transactions on Software Engineering, 35(3):347–367.

Tsantalis, N. and Chatzigeorgiou, A. (2011). Ranking refactoring suggestions based on historical volatility. In 2011 15th European conference on software maintenance and reengineering, pages 25–34. IEEE.

Yamashita, A. and Counsell, S. (2013). Code smells as system-level indicators of maintainability: An empirical study. Journal of Systems and Software, 86(10):2639–2653.

Yamashita, A. and Moonen, L. (2012). Do code smells reflect important maintainability aspects? In 2012 28th IEEE international conference on software maintenance (ICSM), pages 306–315. IEEE.

Yamashita, A. and Moonen, L. (2013). Exploring the impact of inter-smell relations on software maintainability: An empirical study. In 2013 35th International Conference on Software Engineering (ICSE), pages 682–691. IEEE.