Interpreting Classification Models Using Feature Importance Based on Marginal Local Effects
Machine learning models are widespread in many different fields due to their remarkable performances in many tasks. Some require greater interpretability, which often signifies that it is necessary to understand the mechanism underlying the algorithms. Feature importance is the most common explanation and is essential in data mining, especially in applied research. There is a frequent need to compare the effect of features over time, across models, or even across studies. For this, a single metric for each feature shared by all may be more suitable. Thus, analysts may gain better first-order insights regarding feature behavior across these different scenarios. The β-coefficients of additive models, such as logistic regressions, have been widely used for this purpose. They describe the relationships among predictors and outcomes in a single number, indicating both their direction and size. However, for black-box models, there is no metric with these same characteristics. Furthermore, even the β-coefficients in logistic regression models have limitations. Hence, this paper discusses these limitations together with the existing alternatives for overcoming them, and proposes new metrics of feature importance. As with the coefficients, these metrics indicate the feature effect’s size and direction, but in the probability scale within a model-agnostic framework. An experiment conducted on openly available breast cancer data from the UCI Archive verified the suitability of these metrics, and another on real-world data demonstrated how they may be helpful in practice.