research-article

Applying Machine Learning to Customized Smell Detection: A Multi-Project Study

Authors:
Daniel Oliveira

Pontifical Catholic University, Rio de Janeiro, RJ

Pontifical Catholic University, Rio de Janeiro, RJ
View Profile

,
Wesley K. G. Assunção

Federal University of Technology, Toledo, PR

Federal University of Technology, Toledo, PR
View Profile

,
Leonardo Souza

Carnegie Mellon University, Silicon Valley, CA

Carnegie Mellon University, Silicon Valley, CA
View Profile

,
Willian Oizumi

Pontifical Catholic University, Rio de Janeiro, RJ

Pontifical Catholic University, Rio de Janeiro, RJ
View Profile

,
Alessandro Garcia

Pontifical Catholic University, Rio de Janeiro, RJ

Pontifical Catholic University, Rio de Janeiro, RJ
View Profile

,
Baldoino Fonseca

Federal University of Alagoas, Maceió, AL

Federal University of Alagoas, Maceió, AL
View Profile

SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software EngineeringOctober 2020Pages 233–242https://doi.org/10.1145/3422392.3422427

Published:21 December 2020Publication History

SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software Engineering

Pages 233–242

ABSTRACT

Code smells are considered symptoms of poor implementation choices, which may hamper the software maintainability. Hence, code smells should be detected as early as possible to avoid software quality degradation. Unfortunately, detecting code smells is not a trivial task. Some preliminary studies investigated and concluded that machine learning (ML) techniques are a promising way to better support smell detection. However, these techniques are hard to be customized to promote an early and accurate detection of specific smell types. Yet, ML techniques usually require numerous code examples to be trained (composing a relevant dataset) in order to achieve satisfactory accuracy. Unfortunately, such a dependency on a large validated dataset is impractical and leads to late detection of code smells. Thus, a prevailing challenge is the early customized detection of code smells taking into account the typical limited training data. In this direction, this paper reports a study in which we collected code smells, from ten active projects, that were actually refactored by developers, differently from studies that rely on code smells inferred by researchers. These smells were used for evaluating the accuracy regarding early detection of code smells by using seven ML techniques. Once we take into account such smells that were considered as important by developers, the ML techniques are able to customize the detection in order to focus on smells observed as relevant in the investigated systems. The results showed that all the analyzed techniques are sensitive to the type of smell and obtained good results for the majority of them, especially JRip and Random Forest. We also observe that the ML techniques did not need a high number of examples to reach their best accuracy results. This finding implies that ML techniques can be successfully used for early detection of smells without depending on the curation of a large dataset.

References

Marwen Abbes, Foutse Khomh, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2011. An empirical study of the impact of two antipatterns, blob and spaghetti code, on programcomprehension. In 15th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 181--190.Google Scholar
Lucas Amorim, Evandro Costa, Nuno Antunes, Baldoino Fonseca, and Marcio Ribeiro. 2015. Experience Report: Evaluating the Effectiveness of Decision Trees for Detecting Code Smells. In Proceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE '15). IEEE Computer Society, Washington, DC, USA, 261--269. https://doi.org/10.1109/ISSRE.2015.7381819Google ScholarDigital Library
Roberta Arcoverde, Isela Macia, Alessandro Garcia, and Arndt Von Staa. 2012. Automatically detecting architecturally-relevant code anomalies. In 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE). IEEE, 90--91.Google ScholarCross Ref
Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115--138. https://doi.org/10.1016/j.infsof.2018.12.009Google ScholarCross Ref
Gabriele Bavota, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, and Fabio Palomba. 2015. An experimental investigation on the innate relationship between quality and refactoring. Journal of Systems and Software (JSS) 107 (2015), 1--14.Google ScholarDigital Library
Richard Bellman. 1966. Dynamic programming. Science 153, 3731 (1966), 34--37.Google Scholar
Ana Carla Bibiano, Eduardo Fernandes, Daniel Oliveira, Alessandro Garcia, Marcos Kalinowski, Baldoino Fonseca, Roberto Oliveira, Anderson Oliveira, and Diego Cedrim. 2019. A Quantitative Study on Characteristics and Effect of Batch Refactoring on Code Smells. In 13th International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--11.Google ScholarCross Ref
Diego Cedrim, Alessandro Garcia, Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez. 2017. Understanding the impact of refactoring on smells. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE). 465--475.Google Scholar
Alexander Chávez, Isabella Ferreira, Eduardo Fernandes, Diego Cedrim, and Alessandro Garcia. 2017. How does refactoring affect internal quality attributes? A multi-project study. In Proceedings of the 31st Brazilian Symposium on Software Engineering(SBES). 74--83.Google ScholarDigital Library
William W. Cohen. 1995. Fast Effective Rule Induction. In Twelfth International Conference on Machine Learning. Morgan Kaufmann, 115--123.Google Scholar
Warteruzannan Soyer Cunha and Valter Vieira de Camargo. 2019. Uma Investigação da Aplicação de Aprendizado de Máquina para Detecção de Smells Arquiteturais. In Anais do VII Workshop on Software Visualization, Evolution and Maintenance (VEM) (Salvador). SBC, Porto Alegre, RS, Brasil, 78--85. https://doi.org/10.5753/vem.2019.7587Google ScholarCross Ref
R. M. d. Mello, R. F. Oliveira, and A. F. Garcia. 2017. On the Influence of Human Factors for Identifying Code Smells: A Multi-Trial Empirical Study. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 68--77. https://doi.org/10.1109/ESEM.2017.13Google ScholarDigital Library
Rafael de Mello, Anderson Uchôa, Roberto Oliveira, Willian Oizumi, Jairo Souza, Kleyson Mendes, Daniel Oliveira, Baldoino Fonseca, and Alessandro Garcia. 2019. Do Research and Practice of Code Smell Identification Walk Together? A Social Representations Analysis. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1--6.Google ScholarCross Ref
D. Di Nucci, F. Palomba, D. A. Tamburri, A. Serebrenik, and A. De Lucia. 2018. Detecting code smells using machine learning techniques: Are we there yet?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 612--621.Google Scholar
Eduardo Fernandes, Johnatan Oliveira, Gustavo Vale, Thanis Paiva, and Eduardo Figueiredo. 2016. A review-based comparative study of bad smell detection tools. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE). 18:1--18:12.Google ScholarDigital Library
Francesca Arcelli Fontana, Pietro Braione, and Marco Zanoni. 2012. Automatic detection of bad smells in code: An experimental assessment. Journal of Object Technology 11, 2 (2012), 5--1.Google Scholar
Francesca Arcelli Fontana, Mika V. Mäntylä, Marco Zanoni, and Alessandro Marino. 2015. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering (June 2015). https://doi.org/10.1007/s10664-015-9378-4Google ScholarCross Ref
Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V. Mäntylä. 2013. Code Smell Detection: Towards a Machine Learning-Based Approach. 2013 IEEE International Conference on Software Maintenance (sep 2013), 396--399. https://doi.org/10.1109/ICSM.2013.56Google ScholarDigital Library
Martin Fowler. 1999. Refactoring (1 ed.). Addison-Wesley Professional.Google Scholar
Martin Fowler. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston, MA, USA.Google ScholarDigital Library
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, and Ian H Reutemann, Peter and Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10--18.Google ScholarDigital Library
Tin Kam Ho. 1995. Random decision forests. In Document analysis and recognition, 1995., proceedings of the third international conference on, Vol. 1. IEEE, 278--282.Google ScholarDigital Library
R.C. Holte. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11 (1993), 63--91.Google ScholarDigital Library
Mario Hozano, Nuno Antunes, Baldoino Fonseca, and Evandro Costa. 2017. Evaluating the Accuracy of Machine Learning Algorithms on Detecting Code Smells for Different Developers. In Proceedings of the 19th International Conference on Enterprise Information Systems. 474--482.Google ScholarCross Ref
Mario Hozano, Alessandro Garcia, Nuno Antunes, Baldoino Fonseca, and Evandro Costa. 2017. Smells Are Sensitive to Developers!: On the Efficiency of (Un)Guided Customized Detection. In Proceedings of the 25th International Conference on Program Comprehension (Buenos Aires, Argentina) (ICPC '17). IEEE Press, Piscataway, NJ, USA, 110--120. https://doi.org/10.1109/ICPC.2017.32Google ScholarDigital Library
Mário Hozano, Alessandro Garcia, Baldoino Fonseca, and Evandro Costa. 2018. Are You Smelling It? Investigating How Similar Developers Detect Code Smells. Information and Software Technology (IST) 93, C (Jan. 2018), 130--146. https://doi.org/10.1016/j.infsof.2017.09.002Google ScholarDigital Library
Allen Kent, Madeline M Berry, Fred U Luehrs Jr, and James W Perry. 1955. Machine literature searching VIII. Operational criteria for designing information retrieval systems. American documentation 6, 2 (1955), 93--101.Google Scholar
Foutse Khomh, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2011. An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empirical Software Engineering 17, 3 (Aug. 2011), 243--275. https://doi.org/10.1007/s10664-011-9171-yGoogle ScholarDigital Library
F Khomh, S Vaucher, Y G Guéhéneuc, and H Sahraoui. 2009. A bayesian approach for the detection of code and design smells. In Quality Software, 2009. QSIC'09. 9th International Conference on. IEEE, 305--314.Google ScholarDigital Library
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2014. An empirical study of refactoring challenges and benefits at Microsoft. TSE'14 40, 7 (2014), 633--649.Google Scholar
Brett Lantz. 2019. Machine learning with R: expert techniques for predictive modeling. Packt Publishing Ltd.Google Scholar
Michele Lanza and Radu Marinescu. 2007. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media.Google ScholarDigital Library
Michele Lanza, Radu Marinescu, and Stéphane Ducasse. 2005. Object-Oriented Metrics in Practice. Springer-Verlag New York, Inc., Secaucus, NJ, USA.Google ScholarDigital Library
Isela Macia, Alessandro Garcia, Christina Chavez, and Arndt von Staa. 2013. Enhancing the detection of code anomalies with architecture-sensitive strategies. In Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on. IEEE, 177--186.Google ScholarDigital Library
Abdou Maiga, Nasir Ali, Neelesh Bhattacharya, Aminata Sabane, and Esma Gueheneuc, Yann-Gael andAimeur. 2012. SMURF: A SVM-based Incremental Anti-pattern Detection Approach. 2012 19th Working Conference on Reverse Engineering (Oct. 2012), 466--475. https://doi.org/10.1109/WCRE.2012.56Google ScholarDigital Library
Tom M. Mitchell. 1997. Machine learning. McGraw-Hill, Boston (Mass.), Burr Ridge (Ill.), Dubuque (Iowa). http://opac.inria.fr/record=b1093076Google ScholarDigital Library
M.J. Munro. 2005. Product Metrics for Automatic Identification of "Bad Smell" Design Problems in Java Source-Code. 11th IEEE International Software Metrics Symposium (METRICS) (2005), 15--15. https://doi.org/10.1109/METRICS.2005.38Google ScholarDigital Library
Daniel Oliveira. 2020. Towards customizing smell detection and refactorings. (2020). Master dissertation. Pontifical University of Rio de Janeiro.Google Scholar
Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and AndreaDe Lucia. 2014. Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells. IEEE International Conference on Software Maintenance and Evolution (2014), 101--110. https://doi.org/10.1109/ICSME.2014.32Google ScholarDigital Library
Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2019. On the Role of Data Balancing for Machine Learning-Based Code Smell Detection. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (Tallinn, Estonia) (MaLTeSQuE 2019). Association for Computing Machinery, New York, NY, USA, 19--24. https://doi.org/10.1145/3340482.3342744Google ScholarDigital Library
Fabiano Pecorelli, Dario [Di Nucci], Coen [De Roover], and Andrea [De Lucia]. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software 169 (2020), 110693. https://doi.org/10.1016/j.jss.2020.110693Google ScholarCross Ref
Fabiano Pecorelli, Fabio Palomba, Foutse Khomh, and Andrea De Lucia. 2020. Developer-Driven Code Smell Prioritization. In International Conference on Mining Software Repositories.Google Scholar
J. Platt. 1998. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola (Eds.). MIT Press. http://research.microsoft.com/~jplatt/smo.htmlGoogle Scholar
Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.Google ScholarDigital Library
José Amancio M. Santos, Manoel G. Mendonça, Cleber Pereira dos Santos, and Renato Lima Novais. 2014. The problem of conceptualization in god class detection: agreement, strategies and decision drivers. Journal of Software Engineering Research and Development 2 (2014), 1--33.Google ScholarCross Ref
Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. 2016. Why we refactor?. In FSE'16. 858--870.Google ScholarDigital Library
Ingo Steinwart and Andreas Christmann. 2008. Support vector machines. Springer Science & Business Media.Google Scholar
Nikolaos Tsantalis, Victor Guana, Eleni Stroulia, and Abram Hindle. 2013. A multidimensional empirical study on refactoring activity. In 23rd Annual International Conference on Computer Science and Software Engineering. 132--146.Google Scholar
Aiko Yamashita and Leon Moonen. 2012. Do code smells reflect important maintainability aspects?. In 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, 306--315.Google ScholarDigital Library
Aiko Yamashita and Leon Moonen. 2013. Exploring the Impact of Inter-smell Relations on Software Maintainability: An Empirical Study. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE '13). IEEE Press, Piscataway, NJ, USA, 682--691. http://dl.acm.org/citation.cfm?id=2486788.2486878Google ScholarDigital Library

Index Terms

Applying Machine Learning to Customized Smell Detection: A Multi-Project Study
1. Software and its engineering
  1. Software creation and management
    1. Designing software
      1. Software design engineering

Recommendations

Developers’ perception matters: machine learning to detect developer-sensitive smells
Abstract
Code smells are symptoms of poor design that hamper software evolution and maintenance. Hence, code smells should be detected as early as possible to avoid software quality degradation. However, the notion of whether a design and/or implementation ...
Read More
Using developers' feedback to improve code smell detection
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Several studies are focused on the study of code smells and many detection techniques have been proposed. In this scenario, the use of rules involving software-metrics has been widely used in refactoring tools as a mechanism to detect code smells ...
Read More
DT: an upgraded detection tool to automatically detect two kinds of code smell: duplicated code and feature envy
ICGDA '18: Proceedings of the International Conference on Geoinformatics and Data Analysis

Code smell is unreasonable programming, and is produced when software developers don't have good habits of development and experience of development and other reasons. Code becomes more and more chaotic, the code structure become bloated. Code smell can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software Engineering
October 2020
901 pages
ISBN:9781450387538
DOI:10.1145/3422392
General Chairs:
Everton Cavalcante
UFRN, Brazil
,
Francisco Dantas
UERN, Brazil
,
Thais Batista
UFRN, Brazil
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 December 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
code smell
code smell detection
software quality
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of427submissions,34%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 125
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Applying Machine Learning to Customized Smell Detection: A Multi-Project Study

SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Developers’ perception matters: machine learning to detect developer-sensitive smells

Using developers' feedback to improve code smell detection

DT: an upgraded detection tool to automatically detect two kinds of code smell: duplicated code and feature envy