Applying Machine Learning to Customized Smell Detection: A Multi-Project Study

  • Daniel Oliveira PUC-Rio
  • Wesley K. G. Assunção UTFPR
  • Leonardo Souza Carnegie Mellon University
  • Willian Oizumi PUC-Rio
  • Alessandro Garcia PUC-Rio
  • Baldoino Fonseca UFAL

Resumo


Code smells are considered symptoms of poor implementation choices, which may hamper the software maintainability. Hence, code smells should be detected as early as possible to avoid software quality degradation. Unfortunately, detecting code smells is not a trivial task. Some preliminary studies investigated and concluded that machine learning (ML) techniques are a promising way to better support smell detection. However, these techniques are hard to be customized to promote an early and accurate detection of specific smell types. Yet, ML techniques usually require numerous code examples to be trained (composing a relevant dataset) in order to achieve satisfactory accuracy. Unfortunately, such a dependency on a large validated dataset is impractical and leads to late detection of code smells. Thus, a prevailing challenge is the early customized detection of code smells taking into account the typical limited training data. In this direction, this paper reports a study in which we collected code smells, from ten active projects, that were actually refactored by developers, differently from studies that rely on code smells inferred by researchers. These smells were used for evaluating the accuracy regarding early detection of code smells by using seven ML techniques. Once we take into account such smells that were considered as important by developers, the ML techniques are able to customize the detection in order to focus on smells observed as relevant in the investigated systems. The results showed that all the analyzed techniques are sensitive to the type of smell and obtained good results for the majority of them, especially JRip and Random Forest. We also observe that the ML techniques did not need a high number of examples to reach their best accuracy results. This finding implies that ML techniques can be successfully used for early detection of smells without depending on the curation of a large dataset.
Palavras-chave: software quality, code smell detection, code smell
Publicado
21/10/2020
OLIVEIRA, Daniel; ASSUNÇÃO, Wesley K. G.; SOUZA, Leonardo; OIZUMI, Willian; GARCIA, Alessandro; FONSECA, Baldoino. Applying Machine Learning to Customized Smell Detection: A Multi-Project Study. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 34. , 2020, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 .