ABSTRACT
Code smells are considered symptoms of poor implementation choices, which may hamper the software maintainability. Hence, code smells should be detected as early as possible to avoid software quality degradation. Unfortunately, detecting code smells is not a trivial task. Some preliminary studies investigated and concluded that machine learning (ML) techniques are a promising way to better support smell detection. However, these techniques are hard to be customized to promote an early and accurate detection of specific smell types. Yet, ML techniques usually require numerous code examples to be trained (composing a relevant dataset) in order to achieve satisfactory accuracy. Unfortunately, such a dependency on a large validated dataset is impractical and leads to late detection of code smells. Thus, a prevailing challenge is the early customized detection of code smells taking into account the typical limited training data. In this direction, this paper reports a study in which we collected code smells, from ten active projects, that were actually refactored by developers, differently from studies that rely on code smells inferred by researchers. These smells were used for evaluating the accuracy regarding early detection of code smells by using seven ML techniques. Once we take into account such smells that were considered as important by developers, the ML techniques are able to customize the detection in order to focus on smells observed as relevant in the investigated systems. The results showed that all the analyzed techniques are sensitive to the type of smell and obtained good results for the majority of them, especially JRip and Random Forest. We also observe that the ML techniques did not need a high number of examples to reach their best accuracy results. This finding implies that ML techniques can be successfully used for early detection of smells without depending on the curation of a large dataset.
- Marwen Abbes, Foutse Khomh, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2011. An empirical study of the impact of two antipatterns, blob and spaghetti code, on programcomprehension. In 15th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 181--190.Google Scholar
- Lucas Amorim, Evandro Costa, Nuno Antunes, Baldoino Fonseca, and Marcio Ribeiro. 2015. Experience Report: Evaluating the Effectiveness of Decision Trees for Detecting Code Smells. In Proceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE '15). IEEE Computer Society, Washington, DC, USA, 261--269. https://doi.org/10.1109/ISSRE.2015.7381819Google ScholarDigital Library
- Roberta Arcoverde, Isela Macia, Alessandro Garcia, and Arndt Von Staa. 2012. Automatically detecting architecturally-relevant code anomalies. In 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE). IEEE, 90--91.Google ScholarCross Ref
- Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115--138. https://doi.org/10.1016/j.infsof.2018.12.009Google ScholarCross Ref
- Gabriele Bavota, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, and Fabio Palomba. 2015. An experimental investigation on the innate relationship between quality and refactoring. Journal of Systems and Software (JSS) 107 (2015), 1--14.Google ScholarDigital Library
- Richard Bellman. 1966. Dynamic programming. Science 153, 3731 (1966), 34--37.Google Scholar
- Ana Carla Bibiano, Eduardo Fernandes, Daniel Oliveira, Alessandro Garcia, Marcos Kalinowski, Baldoino Fonseca, Roberto Oliveira, Anderson Oliveira, and Diego Cedrim. 2019. A Quantitative Study on Characteristics and Effect of Batch Refactoring on Code Smells. In 13th International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--11.Google ScholarCross Ref
- Diego Cedrim, Alessandro Garcia, Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez. 2017. Understanding the impact of refactoring on smells. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE). 465--475.Google Scholar
- Alexander Chávez, Isabella Ferreira, Eduardo Fernandes, Diego Cedrim, and Alessandro Garcia. 2017. How does refactoring affect internal quality attributes? A multi-project study. In Proceedings of the 31st Brazilian Symposium on Software Engineering(SBES). 74--83.Google ScholarDigital Library
- William W. Cohen. 1995. Fast Effective Rule Induction. In Twelfth International Conference on Machine Learning. Morgan Kaufmann, 115--123.Google Scholar
- Warteruzannan Soyer Cunha and Valter Vieira de Camargo. 2019. Uma Investigação da Aplicação de Aprendizado de Máquina para Detecção de Smells Arquiteturais. In Anais do VII Workshop on Software Visualization, Evolution and Maintenance (VEM) (Salvador). SBC, Porto Alegre, RS, Brasil, 78--85. https://doi.org/10.5753/vem.2019.7587Google ScholarCross Ref
- R. M. d. Mello, R. F. Oliveira, and A. F. Garcia. 2017. On the Influence of Human Factors for Identifying Code Smells: A Multi-Trial Empirical Study. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 68--77. https://doi.org/10.1109/ESEM.2017.13Google ScholarDigital Library
- Rafael de Mello, Anderson Uchôa, Roberto Oliveira, Willian Oizumi, Jairo Souza, Kleyson Mendes, Daniel Oliveira, Baldoino Fonseca, and Alessandro Garcia. 2019. Do Research and Practice of Code Smell Identification Walk Together? A Social Representations Analysis. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1--6.Google ScholarCross Ref
- D. Di Nucci, F. Palomba, D. A. Tamburri, A. Serebrenik, and A. De Lucia. 2018. Detecting code smells using machine learning techniques: Are we there yet?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 612--621.Google Scholar
- Eduardo Fernandes, Johnatan Oliveira, Gustavo Vale, Thanis Paiva, and Eduardo Figueiredo. 2016. A review-based comparative study of bad smell detection tools. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE). 18:1--18:12.Google ScholarDigital Library
- Francesca Arcelli Fontana, Pietro Braione, and Marco Zanoni. 2012. Automatic detection of bad smells in code: An experimental assessment. Journal of Object Technology 11, 2 (2012), 5--1.Google Scholar
- Francesca Arcelli Fontana, Mika V. Mäntylä, Marco Zanoni, and Alessandro Marino. 2015. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering (June 2015). https://doi.org/10.1007/s10664-015-9378-4Google ScholarCross Ref
- Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V. Mäntylä. 2013. Code Smell Detection: Towards a Machine Learning-Based Approach. 2013 IEEE International Conference on Software Maintenance (sep 2013), 396--399. https://doi.org/10.1109/ICSM.2013.56Google ScholarDigital Library
- Martin Fowler. 1999. Refactoring (1 ed.). Addison-Wesley Professional.Google Scholar
- Martin Fowler. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston, MA, USA.Google ScholarDigital Library
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, and Ian H Reutemann, Peter and Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10--18.Google ScholarDigital Library
- Tin Kam Ho. 1995. Random decision forests. In Document analysis and recognition, 1995., proceedings of the third international conference on, Vol. 1. IEEE, 278--282.Google ScholarDigital Library
- R.C. Holte. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11 (1993), 63--91.Google ScholarDigital Library
- Mario Hozano, Nuno Antunes, Baldoino Fonseca, and Evandro Costa. 2017. Evaluating the Accuracy of Machine Learning Algorithms on Detecting Code Smells for Different Developers. In Proceedings of the 19th International Conference on Enterprise Information Systems. 474--482.Google ScholarCross Ref
- Mario Hozano, Alessandro Garcia, Nuno Antunes, Baldoino Fonseca, and Evandro Costa. 2017. Smells Are Sensitive to Developers!: On the Efficiency of (Un)Guided Customized Detection. In Proceedings of the 25th International Conference on Program Comprehension (Buenos Aires, Argentina) (ICPC '17). IEEE Press, Piscataway, NJ, USA, 110--120. https://doi.org/10.1109/ICPC.2017.32Google ScholarDigital Library
- Mário Hozano, Alessandro Garcia, Baldoino Fonseca, and Evandro Costa. 2018. Are You Smelling It? Investigating How Similar Developers Detect Code Smells. Information and Software Technology (IST) 93, C (Jan. 2018), 130--146. https://doi.org/10.1016/j.infsof.2017.09.002Google ScholarDigital Library
- Allen Kent, Madeline M Berry, Fred U Luehrs Jr, and James W Perry. 1955. Machine literature searching VIII. Operational criteria for designing information retrieval systems. American documentation 6, 2 (1955), 93--101.Google Scholar
- Foutse Khomh, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2011. An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empirical Software Engineering 17, 3 (Aug. 2011), 243--275. https://doi.org/10.1007/s10664-011-9171-yGoogle ScholarDigital Library
- F Khomh, S Vaucher, Y G Guéhéneuc, and H Sahraoui. 2009. A bayesian approach for the detection of code and design smells. In Quality Software, 2009. QSIC'09. 9th International Conference on. IEEE, 305--314.Google ScholarDigital Library
- Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2014. An empirical study of refactoring challenges and benefits at Microsoft. TSE'14 40, 7 (2014), 633--649.Google Scholar
- Brett Lantz. 2019. Machine learning with R: expert techniques for predictive modeling. Packt Publishing Ltd.Google Scholar
- Michele Lanza and Radu Marinescu. 2007. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media.Google ScholarDigital Library
- Michele Lanza, Radu Marinescu, and Stéphane Ducasse. 2005. Object-Oriented Metrics in Practice. Springer-Verlag New York, Inc., Secaucus, NJ, USA.Google ScholarDigital Library
- Isela Macia, Alessandro Garcia, Christina Chavez, and Arndt von Staa. 2013. Enhancing the detection of code anomalies with architecture-sensitive strategies. In Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on. IEEE, 177--186.Google ScholarDigital Library
- Abdou Maiga, Nasir Ali, Neelesh Bhattacharya, Aminata Sabane, and Esma Gueheneuc, Yann-Gael andAimeur. 2012. SMURF: A SVM-based Incremental Anti-pattern Detection Approach. 2012 19th Working Conference on Reverse Engineering (Oct. 2012), 466--475. https://doi.org/10.1109/WCRE.2012.56Google ScholarDigital Library
- Tom M. Mitchell. 1997. Machine learning. McGraw-Hill, Boston (Mass.), Burr Ridge (Ill.), Dubuque (Iowa). http://opac.inria.fr/record=b1093076Google ScholarDigital Library
- M.J. Munro. 2005. Product Metrics for Automatic Identification of "Bad Smell" Design Problems in Java Source-Code. 11th IEEE International Software Metrics Symposium (METRICS) (2005), 15--15. https://doi.org/10.1109/METRICS.2005.38Google ScholarDigital Library
- Daniel Oliveira. 2020. Towards customizing smell detection and refactorings. (2020). Master dissertation. Pontifical University of Rio de Janeiro.Google Scholar
- Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and AndreaDe Lucia. 2014. Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells. IEEE International Conference on Software Maintenance and Evolution (2014), 101--110. https://doi.org/10.1109/ICSME.2014.32Google ScholarDigital Library
- Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2019. On the Role of Data Balancing for Machine Learning-Based Code Smell Detection. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (Tallinn, Estonia) (MaLTeSQuE 2019). Association for Computing Machinery, New York, NY, USA, 19--24. https://doi.org/10.1145/3340482.3342744Google ScholarDigital Library
- Fabiano Pecorelli, Dario [Di Nucci], Coen [De Roover], and Andrea [De Lucia]. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software 169 (2020), 110693. https://doi.org/10.1016/j.jss.2020.110693Google ScholarCross Ref
- Fabiano Pecorelli, Fabio Palomba, Foutse Khomh, and Andrea De Lucia. 2020. Developer-Driven Code Smell Prioritization. In International Conference on Mining Software Repositories.Google Scholar
- J. Platt. 1998. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola (Eds.). MIT Press. http://research.microsoft.com/~jplatt/smo.htmlGoogle Scholar
- Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.Google ScholarDigital Library
- José Amancio M. Santos, Manoel G. Mendonça, Cleber Pereira dos Santos, and Renato Lima Novais. 2014. The problem of conceptualization in god class detection: agreement, strategies and decision drivers. Journal of Software Engineering Research and Development 2 (2014), 1--33.Google ScholarCross Ref
- Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. 2016. Why we refactor?. In FSE'16. 858--870.Google ScholarDigital Library
- Ingo Steinwart and Andreas Christmann. 2008. Support vector machines. Springer Science & Business Media.Google Scholar
- Nikolaos Tsantalis, Victor Guana, Eleni Stroulia, and Abram Hindle. 2013. A multidimensional empirical study on refactoring activity. In 23rd Annual International Conference on Computer Science and Software Engineering. 132--146.Google Scholar
- Aiko Yamashita and Leon Moonen. 2012. Do code smells reflect important maintainability aspects?. In 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, 306--315.Google ScholarDigital Library
- Aiko Yamashita and Leon Moonen. 2013. Exploring the Impact of Inter-smell Relations on Software Maintainability: An Empirical Study. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE '13). IEEE Press, Piscataway, NJ, USA, 682--691. http://dl.acm.org/citation.cfm?id=2486788.2486878Google ScholarDigital Library
Index Terms
- Applying Machine Learning to Customized Smell Detection: A Multi-Project Study
Recommendations
Developers’ perception matters: machine learning to detect developer-sensitive smells
AbstractCode smells are symptoms of poor design that hamper software evolution and maintenance. Hence, code smells should be detected as early as possible to avoid software quality degradation. However, the notion of whether a design and/or implementation ...
Using developers' feedback to improve code smell detection
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied ComputingSeveral studies are focused on the study of code smells and many detection techniques have been proposed. In this scenario, the use of rules involving software-metrics has been widely used in refactoring tools as a mechanism to detect code smells ...
DT: an upgraded detection tool to automatically detect two kinds of code smell: duplicated code and feature envy
ICGDA '18: Proceedings of the International Conference on Geoinformatics and Data AnalysisCode smell is unreasonable programming, and is produced when software developers don't have good habits of development and experience of development and other reasons. Code becomes more and more chaotic, the code structure become bloated. Code smell can ...
Comments