Understanding Thresholds of Software Features for Defect Prediction

  • Geanderson Santos UFMG
  • Adriano Veloso UFMG
  • Eduardo Figueiredo UFMG

Resumo


Software defect prediction is a subject of study involving the interplay of the software engineering and machine learning areas. The current literature proposed numerous machine learning models to predict software defects from software data, such as commits and code metrics. However, existing machine learning models are more valuable when we can understand the prediction. Otherwise, software developers cannot reason why a machine learning model made such predictions, generating many questions about the model’s applicability in software projects. As explainable machine learning models for the defect prediction problem remain a recent research topic, it leaves room for exploration. In this paper, we propose a preliminary analysis of an extensive dataset to predict software defects. The dataset includes 47,618 classes from 53 open-source projects and covers 66 software features related to numerous features of the code. Therefore, we offer contributions on explaining how each selected software feature favors the prediction of software defects in Java projects. Our initial results suggest that developers should keep the values of some specific software features small to avoid software defects. We hope our approach can guide more discussions about explainable machine learning for defect prediction and its impact on software development.
Palavras-chave: explainable machine learning, software features for defect prediction, defect prediction
Publicado
03/10/2022
SANTOS, Geanderson; VELOSO, Adriano; FIGUEIREDO, Eduardo. Understanding Thresholds of Software Features for Defect Prediction. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 36. , 2022, Uberlândia. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 305–310.