The Effect of Statistical Hypothesis Testing on Machine Learning Model Selection

Gonçalves, Marcel Chacon; Silva, Rodrigo

doi:10.1007/978-3-031-45389-2_28

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14196))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

255 Accesses

Abstract

Statistical tests of hypothesis play a crucial role in evaluating the performance of machine learning (ML) models and selecting the best model among a set of candidates. However, their effectiveness in selecting models over larger periods of time remains unclear. This study aims to investigate the impact of statistical tests on ML model selection in sequential experiments. Specifically, we examine whether selecting models based on statistical tests leads to higher quality models after a significant number of iterations and explore the effect of the number of tests performed and the preferred statistical test for different experimental time horizons.

The study on binary classification problems reveals that the use of statistical tests should be approached with caution, particularly in challenging scenarios where generating improved models is difficult. The analysis demonstrates that statistical tests may impede progress and impose overly stringent acceptance criteria for new models, hindering the selection of high-quality models. The findings also indicate that the dominance of versions without statistical tests remained consistent, suggesting the need for further research in this area.

Although this study is limited by the number of datasets and the absence of pre-test assumption verification, it emphasizes the importance of understanding the impact of statistical tests on ML model selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aygun, B., Gunay, E.K.: Comparison of statistical and machine learning algorithms for forecasting daily bitcoin returns. Avrupa Bilim ve Teknoloji Dergisi (21), pp. 444–454 (2021)
Google Scholar
Bao, D., et al.: Discriminating between p16-negative oropharyngeal and non-oropharyngeal origins by their metastatic lymph nodes using machine learning approach based on MRI radiomics (2022)
Google Scholar
Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(77), 1–36 (2017). http://jmlr.org/papers/v18/16-305.html
Bender, A., Schneider, N., Segler, M., Patrick Walters, W., Engkvist, O., Rodrigues, T.: Evaluation guidelines for machine learning tools in the chemical sciences. Nat. Rev. Chem. 6(6), 428–442 (2022)
Article Google Scholar
Corani, G., Benavoli, A.: A bayesian approach for comparing cross-validated algorithms on multiple data sets. Mach. Learn. 100(2–3), 285–304 (2015)
Article MathSciNet MATH Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Fagerland, M.W.: t-tests, non-parametric tests, and large studies-a paradox of statistical practice? BMC Med. Res. Methodol. 12(1), 1–7 (2012)
Article Google Scholar
Hair, J.F., Jr., Sarstedt, M.: Data, measurement, and causal inferences in machine learning: opportunities and challenges for marketing. J. Market. Theory Practice 29(1), 65–77 (2021)
Article Google Scholar
Hopkins, M., Reeber, E., Forman, G., Suermondt, J.: Spambase. UCI Machine Learning Repository (1999). https://doi.org/10.24432/C53G6X
Article Google Scholar
Janosi, A., Steinbrunn, W., Pfisterer, M., Detrano, R., M.D., M.: Heart Disease. UCI Machine Learning Repository (1988). https://doi.org/10.24432/C52P4X
Kim, T.K.: T test as a parametric statistic. Korean J. Anesthesiol. 68(6), 540–546 (2015)
Article MathSciNet Google Scholar
Morettin, P.A., Bussab, W.O.: Estatística básica. Saraiva Educação SA (2017)
Google Scholar
Moro, S., Rita, P., Cortez, P.: Bank Marketing. UCI Machine Learning Repository (2012). https://doi.org/10.24432/C5K306
Article Google Scholar
Trawiński, B., Smetek, M., Telec, Z., Lasota, T.: Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int. J. Appl. Math. Comput. Sci. 22(4), 867–881 (2012)
Article MathSciNet MATH Google Scholar
Van Rijsbergen, C.J.: Information retrieval. (No Title) (1979)
Google Scholar
Virtanen, P., et al.: SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in python. Nature Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Wong, T.T., Yeh, P.Y.: Reliable accuracy estimates from k-fold cross validation. IEEE Trans. Knowl. Data Eng. 32(8), 1586–1594 (2019)
Article Google Scholar
Yeh, I.C.: default of credit card clients. UCI Mach. Learn. Repository (2016). https://doi.org/10.24432/C55S3H
Article Google Scholar

Download references

Acknowledgments

This work was supported by CNPq - National Council for Scientific and Technological Development, CAPES - Coordination for the Improvement of Higher Education Personnel and UFOP - Federal University of Ouro Preto.

Author information

Authors and Affiliations

Graduate Program on Computer Science, Universidade Federal de Ouro Preto, Ouro Preto, Brazil
Marcel Chacon Gonçalves
Department of Computer Science, Universidade Federal de Ouro Preto, Ouro Preto, Brazil
Rodrigo Silva

Authors

Marcel Chacon Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo Silva .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonçalves, M.C., Silva, R. (2023). The Effect of Statistical Hypothesis Testing on Machine Learning Model Selection. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-45389-2_28
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Effect of Statistical Hypothesis Testing on Machine Learning Model Selection