Abstract
Statistical tests of hypothesis play a crucial role in evaluating the performance of machine learning (ML) models and selecting the best model among a set of candidates. However, their effectiveness in selecting models over larger periods of time remains unclear. This study aims to investigate the impact of statistical tests on ML model selection in sequential experiments. Specifically, we examine whether selecting models based on statistical tests leads to higher quality models after a significant number of iterations and explore the effect of the number of tests performed and the preferred statistical test for different experimental time horizons.
The study on binary classification problems reveals that the use of statistical tests should be approached with caution, particularly in challenging scenarios where generating improved models is difficult. The analysis demonstrates that statistical tests may impede progress and impose overly stringent acceptance criteria for new models, hindering the selection of high-quality models. The findings also indicate that the dominance of versions without statistical tests remained consistent, suggesting the need for further research in this area.
Although this study is limited by the number of datasets and the absence of pre-test assumption verification, it emphasizes the importance of understanding the impact of statistical tests on ML model selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aygun, B., Gunay, E.K.: Comparison of statistical and machine learning algorithms for forecasting daily bitcoin returns. Avrupa Bilim ve Teknoloji Dergisi (21), pp. 444–454 (2021)
Bao, D., et al.: Discriminating between p16-negative oropharyngeal and non-oropharyngeal origins by their metastatic lymph nodes using machine learning approach based on MRI radiomics (2022)
Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(77), 1–36 (2017). http://jmlr.org/papers/v18/16-305.html
Bender, A., Schneider, N., Segler, M., Patrick Walters, W., Engkvist, O., Rodrigues, T.: Evaluation guidelines for machine learning tools in the chemical sciences. Nat. Rev. Chem. 6(6), 428–442 (2022)
Corani, G., Benavoli, A.: A bayesian approach for comparing cross-validated algorithms on multiple data sets. Mach. Learn. 100(2–3), 285–304 (2015)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Fagerland, M.W.: t-tests, non-parametric tests, and large studies-a paradox of statistical practice? BMC Med. Res. Methodol. 12(1), 1–7 (2012)
Hair, J.F., Jr., Sarstedt, M.: Data, measurement, and causal inferences in machine learning: opportunities and challenges for marketing. J. Market. Theory Practice 29(1), 65–77 (2021)
Hopkins, M., Reeber, E., Forman, G., Suermondt, J.: Spambase. UCI Machine Learning Repository (1999). https://doi.org/10.24432/C53G6X
Janosi, A., Steinbrunn, W., Pfisterer, M., Detrano, R., M.D., M.: Heart Disease. UCI Machine Learning Repository (1988). https://doi.org/10.24432/C52P4X
Kim, T.K.: T test as a parametric statistic. Korean J. Anesthesiol. 68(6), 540–546 (2015)
Morettin, P.A., Bussab, W.O.: Estatística básica. Saraiva Educação SA (2017)
Moro, S., Rita, P., Cortez, P.: Bank Marketing. UCI Machine Learning Repository (2012). https://doi.org/10.24432/C5K306
Trawiński, B., Smetek, M., Telec, Z., Lasota, T.: Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int. J. Appl. Math. Comput. Sci. 22(4), 867–881 (2012)
Van Rijsbergen, C.J.: Information retrieval. (No Title) (1979)
Virtanen, P., et al.: SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in python. Nature Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Wong, T.T., Yeh, P.Y.: Reliable accuracy estimates from k-fold cross validation. IEEE Trans. Knowl. Data Eng. 32(8), 1586–1594 (2019)
Yeh, I.C.: default of credit card clients. UCI Mach. Learn. Repository (2016). https://doi.org/10.24432/C55S3H
Acknowledgments
This work was supported by CNPq - National Council for Scientific and Technological Development, CAPES - Coordination for the Improvement of Higher Education Personnel and UFOP - Federal University of Ouro Preto.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gonçalves, M.C., Silva, R. (2023). The Effect of Statistical Hypothesis Testing on Machine Learning Model Selection. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-45389-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)