The Effect of Statistical Hypothesis Testing on Machine Learning Model Selection

Marcel Chacon Gonçalves; Rodrigo Silva

Marcel Chacon Gonçalves UFOP
Rodrigo Silva UFOP https://orcid.org/0000-0003-2547-3835

Resumo

Statistical tests of hypothesis play a crucial role in evaluating the performance of machine learning (ML) models and selecting the best model among a set of candidates. However, their effectiveness in selecting models over larger periods of time remains unclear. This study aims to investigate the impact of statistical tests on ML model selection in sequential experiments. Specifically, we examine whether selecting models based on statistical tests leads to higher quality models after a significant number of iterations and explore the effect of the number of tests performed and the preferred statistical test for different experimental time horizons. The study on binary classification problems reveals that the use of statistical tests should be approached with caution, particularly in challenging scenarios where generating improved models is difficult. The analysis demonstrates that statistical tests may impede progress and impose overly stringent acceptance criteria for new models, hindering the selection of high-quality models. The findings also indicate that the dominance of versions without statistical tests remained consistent, suggesting the need for further research in this area. Although this study is limited by the number of datasets and the absence of pre-test assumption verification, it emphasizes the importance of understanding the impact of statistical tests on ML model selection.