skip to main content
10.1145/3422392.3422427acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

Applying Machine Learning to Customized Smell Detection: A Multi-Project Study

Published:21 December 2020Publication History

ABSTRACT

Code smells are considered symptoms of poor implementation choices, which may hamper the software maintainability. Hence, code smells should be detected as early as possible to avoid software quality degradation. Unfortunately, detecting code smells is not a trivial task. Some preliminary studies investigated and concluded that machine learning (ML) techniques are a promising way to better support smell detection. However, these techniques are hard to be customized to promote an early and accurate detection of specific smell types. Yet, ML techniques usually require numerous code examples to be trained (composing a relevant dataset) in order to achieve satisfactory accuracy. Unfortunately, such a dependency on a large validated dataset is impractical and leads to late detection of code smells. Thus, a prevailing challenge is the early customized detection of code smells taking into account the typical limited training data. In this direction, this paper reports a study in which we collected code smells, from ten active projects, that were actually refactored by developers, differently from studies that rely on code smells inferred by researchers. These smells were used for evaluating the accuracy regarding early detection of code smells by using seven ML techniques. Once we take into account such smells that were considered as important by developers, the ML techniques are able to customize the detection in order to focus on smells observed as relevant in the investigated systems. The results showed that all the analyzed techniques are sensitive to the type of smell and obtained good results for the majority of them, especially JRip and Random Forest. We also observe that the ML techniques did not need a high number of examples to reach their best accuracy results. This finding implies that ML techniques can be successfully used for early detection of smells without depending on the curation of a large dataset.

References

  1. Marwen Abbes, Foutse Khomh, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2011. An empirical study of the impact of two antipatterns, blob and spaghetti code, on programcomprehension. In 15th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 181--190.Google ScholarGoogle Scholar
  2. Lucas Amorim, Evandro Costa, Nuno Antunes, Baldoino Fonseca, and Marcio Ribeiro. 2015. Experience Report: Evaluating the Effectiveness of Decision Trees for Detecting Code Smells. In Proceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE '15). IEEE Computer Society, Washington, DC, USA, 261--269. https://doi.org/10.1109/ISSRE.2015.7381819Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Roberta Arcoverde, Isela Macia, Alessandro Garcia, and Arndt Von Staa. 2012. Automatically detecting architecturally-relevant code anomalies. In 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE). IEEE, 90--91.Google ScholarGoogle ScholarCross RefCross Ref
  4. Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115--138. https://doi.org/10.1016/j.infsof.2018.12.009Google ScholarGoogle ScholarCross RefCross Ref
  5. Gabriele Bavota, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, and Fabio Palomba. 2015. An experimental investigation on the innate relationship between quality and refactoring. Journal of Systems and Software (JSS) 107 (2015), 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Richard Bellman. 1966. Dynamic programming. Science 153, 3731 (1966), 34--37.Google ScholarGoogle Scholar
  7. Ana Carla Bibiano, Eduardo Fernandes, Daniel Oliveira, Alessandro Garcia, Marcos Kalinowski, Baldoino Fonseca, Roberto Oliveira, Anderson Oliveira, and Diego Cedrim. 2019. A Quantitative Study on Characteristics and Effect of Batch Refactoring on Code Smells. In 13th International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  8. Diego Cedrim, Alessandro Garcia, Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez. 2017. Understanding the impact of refactoring on smells. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE). 465--475.Google ScholarGoogle Scholar
  9. Alexander Chávez, Isabella Ferreira, Eduardo Fernandes, Diego Cedrim, and Alessandro Garcia. 2017. How does refactoring affect internal quality attributes? A multi-project study. In Proceedings of the 31st Brazilian Symposium on Software Engineering(SBES). 74--83.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. William W. Cohen. 1995. Fast Effective Rule Induction. In Twelfth International Conference on Machine Learning. Morgan Kaufmann, 115--123.Google ScholarGoogle Scholar
  11. Warteruzannan Soyer Cunha and Valter Vieira de Camargo. 2019. Uma Investigação da Aplicação de Aprendizado de Máquina para Detecção de Smells Arquiteturais. In Anais do VII Workshop on Software Visualization, Evolution and Maintenance (VEM) (Salvador). SBC, Porto Alegre, RS, Brasil, 78--85. https://doi.org/10.5753/vem.2019.7587Google ScholarGoogle ScholarCross RefCross Ref
  12. R. M. d. Mello, R. F. Oliveira, and A. F. Garcia. 2017. On the Influence of Human Factors for Identifying Code Smells: A Multi-Trial Empirical Study. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 68--77. https://doi.org/10.1109/ESEM.2017.13Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rafael de Mello, Anderson Uchôa, Roberto Oliveira, Willian Oizumi, Jairo Souza, Kleyson Mendes, Daniel Oliveira, Baldoino Fonseca, and Alessandro Garcia. 2019. Do Research and Practice of Code Smell Identification Walk Together? A Social Representations Analysis. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  14. D. Di Nucci, F. Palomba, D. A. Tamburri, A. Serebrenik, and A. De Lucia. 2018. Detecting code smells using machine learning techniques: Are we there yet?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 612--621.Google ScholarGoogle Scholar
  15. Eduardo Fernandes, Johnatan Oliveira, Gustavo Vale, Thanis Paiva, and Eduardo Figueiredo. 2016. A review-based comparative study of bad smell detection tools. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE). 18:1--18:12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Francesca Arcelli Fontana, Pietro Braione, and Marco Zanoni. 2012. Automatic detection of bad smells in code: An experimental assessment. Journal of Object Technology 11, 2 (2012), 5--1.Google ScholarGoogle Scholar
  17. Francesca Arcelli Fontana, Mika V. Mäntylä, Marco Zanoni, and Alessandro Marino. 2015. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering (June 2015). https://doi.org/10.1007/s10664-015-9378-4Google ScholarGoogle ScholarCross RefCross Ref
  18. Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V. Mäntylä. 2013. Code Smell Detection: Towards a Machine Learning-Based Approach. 2013 IEEE International Conference on Software Maintenance (sep 2013), 396--399. https://doi.org/10.1109/ICSM.2013.56Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Martin Fowler. 1999. Refactoring (1 ed.). Addison-Wesley Professional.Google ScholarGoogle Scholar
  20. Martin Fowler. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston, MA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, and Ian H Reutemann, Peter and Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tin Kam Ho. 1995. Random decision forests. In Document analysis and recognition, 1995., proceedings of the third international conference on, Vol. 1. IEEE, 278--282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R.C. Holte. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11 (1993), 63--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mario Hozano, Nuno Antunes, Baldoino Fonseca, and Evandro Costa. 2017. Evaluating the Accuracy of Machine Learning Algorithms on Detecting Code Smells for Different Developers. In Proceedings of the 19th International Conference on Enterprise Information Systems. 474--482.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mario Hozano, Alessandro Garcia, Nuno Antunes, Baldoino Fonseca, and Evandro Costa. 2017. Smells Are Sensitive to Developers!: On the Efficiency of (Un)Guided Customized Detection. In Proceedings of the 25th International Conference on Program Comprehension (Buenos Aires, Argentina) (ICPC '17). IEEE Press, Piscataway, NJ, USA, 110--120. https://doi.org/10.1109/ICPC.2017.32Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mário Hozano, Alessandro Garcia, Baldoino Fonseca, and Evandro Costa. 2018. Are You Smelling It? Investigating How Similar Developers Detect Code Smells. Information and Software Technology (IST) 93, C (Jan. 2018), 130--146. https://doi.org/10.1016/j.infsof.2017.09.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Allen Kent, Madeline M Berry, Fred U Luehrs Jr, and James W Perry. 1955. Machine literature searching VIII. Operational criteria for designing information retrieval systems. American documentation 6, 2 (1955), 93--101.Google ScholarGoogle Scholar
  28. Foutse Khomh, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2011. An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empirical Software Engineering 17, 3 (Aug. 2011), 243--275. https://doi.org/10.1007/s10664-011-9171-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  29. F Khomh, S Vaucher, Y G Guéhéneuc, and H Sahraoui. 2009. A bayesian approach for the detection of code and design smells. In Quality Software, 2009. QSIC'09. 9th International Conference on. IEEE, 305--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2014. An empirical study of refactoring challenges and benefits at Microsoft. TSE'14 40, 7 (2014), 633--649.Google ScholarGoogle Scholar
  31. Brett Lantz. 2019. Machine learning with R: expert techniques for predictive modeling. Packt Publishing Ltd.Google ScholarGoogle Scholar
  32. Michele Lanza and Radu Marinescu. 2007. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michele Lanza, Radu Marinescu, and Stéphane Ducasse. 2005. Object-Oriented Metrics in Practice. Springer-Verlag New York, Inc., Secaucus, NJ, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Isela Macia, Alessandro Garcia, Christina Chavez, and Arndt von Staa. 2013. Enhancing the detection of code anomalies with architecture-sensitive strategies. In Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on. IEEE, 177--186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abdou Maiga, Nasir Ali, Neelesh Bhattacharya, Aminata Sabane, and Esma Gueheneuc, Yann-Gael andAimeur. 2012. SMURF: A SVM-based Incremental Anti-pattern Detection Approach. 2012 19th Working Conference on Reverse Engineering (Oct. 2012), 466--475. https://doi.org/10.1109/WCRE.2012.56Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tom M. Mitchell. 1997. Machine learning. McGraw-Hill, Boston (Mass.), Burr Ridge (Ill.), Dubuque (Iowa). http://opac.inria.fr/record=b1093076Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M.J. Munro. 2005. Product Metrics for Automatic Identification of "Bad Smell" Design Problems in Java Source-Code. 11th IEEE International Software Metrics Symposium (METRICS) (2005), 15--15. https://doi.org/10.1109/METRICS.2005.38Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Daniel Oliveira. 2020. Towards customizing smell detection and refactorings. (2020). Master dissertation. Pontifical University of Rio de Janeiro.Google ScholarGoogle Scholar
  39. Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and AndreaDe Lucia. 2014. Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells. IEEE International Conference on Software Maintenance and Evolution (2014), 101--110. https://doi.org/10.1109/ICSME.2014.32Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2019. On the Role of Data Balancing for Machine Learning-Based Code Smell Detection. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (Tallinn, Estonia) (MaLTeSQuE 2019). Association for Computing Machinery, New York, NY, USA, 19--24. https://doi.org/10.1145/3340482.3342744Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Fabiano Pecorelli, Dario [Di Nucci], Coen [De Roover], and Andrea [De Lucia]. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software 169 (2020), 110693. https://doi.org/10.1016/j.jss.2020.110693Google ScholarGoogle ScholarCross RefCross Ref
  42. Fabiano Pecorelli, Fabio Palomba, Foutse Khomh, and Andrea De Lucia. 2020. Developer-Driven Code Smell Prioritization. In International Conference on Mining Software Repositories.Google ScholarGoogle Scholar
  43. J. Platt. 1998. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola (Eds.). MIT Press. http://research.microsoft.com/~jplatt/smo.htmlGoogle ScholarGoogle Scholar
  44. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. José Amancio M. Santos, Manoel G. Mendonça, Cleber Pereira dos Santos, and Renato Lima Novais. 2014. The problem of conceptualization in god class detection: agreement, strategies and decision drivers. Journal of Software Engineering Research and Development 2 (2014), 1--33.Google ScholarGoogle ScholarCross RefCross Ref
  46. Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. 2016. Why we refactor?. In FSE'16. 858--870.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ingo Steinwart and Andreas Christmann. 2008. Support vector machines. Springer Science & Business Media.Google ScholarGoogle Scholar
  48. Nikolaos Tsantalis, Victor Guana, Eleni Stroulia, and Abram Hindle. 2013. A multidimensional empirical study on refactoring activity. In 23rd Annual International Conference on Computer Science and Software Engineering. 132--146.Google ScholarGoogle Scholar
  49. Aiko Yamashita and Leon Moonen. 2012. Do code smells reflect important maintainability aspects?. In 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, 306--315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Aiko Yamashita and Leon Moonen. 2013. Exploring the Impact of Inter-smell Relations on Software Maintainability: An Empirical Study. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE '13). IEEE Press, Piscataway, NJ, USA, 682--691. http://dl.acm.org/citation.cfm?id=2486788.2486878Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Applying Machine Learning to Customized Smell Detection: A Multi-Project Study

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software Engineering
      October 2020
      901 pages
      ISBN:9781450387538
      DOI:10.1145/3422392

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 December 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate147of427submissions,34%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader