Skip to main content

The Effect of Statistical Hypothesis Testing on Machine Learning Model Selection

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14196))

Included in the following conference series:

  • 255 Accesses

Abstract

Statistical tests of hypothesis play a crucial role in evaluating the performance of machine learning (ML) models and selecting the best model among a set of candidates. However, their effectiveness in selecting models over larger periods of time remains unclear. This study aims to investigate the impact of statistical tests on ML model selection in sequential experiments. Specifically, we examine whether selecting models based on statistical tests leads to higher quality models after a significant number of iterations and explore the effect of the number of tests performed and the preferred statistical test for different experimental time horizons.

The study on binary classification problems reveals that the use of statistical tests should be approached with caution, particularly in challenging scenarios where generating improved models is difficult. The analysis demonstrates that statistical tests may impede progress and impose overly stringent acceptance criteria for new models, hindering the selection of high-quality models. The findings also indicate that the dominance of versions without statistical tests remained consistent, suggesting the need for further research in this area.

Although this study is limited by the number of datasets and the absence of pre-test assumption verification, it emphasizes the importance of understanding the impact of statistical tests on ML model selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aygun, B., Gunay, E.K.: Comparison of statistical and machine learning algorithms for forecasting daily bitcoin returns. Avrupa Bilim ve Teknoloji Dergisi (21), pp. 444–454 (2021)

    Google Scholar 

  2. Bao, D., et al.: Discriminating between p16-negative oropharyngeal and non-oropharyngeal origins by their metastatic lymph nodes using machine learning approach based on MRI radiomics (2022)

    Google Scholar 

  3. Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(77), 1–36 (2017). http://jmlr.org/papers/v18/16-305.html

  4. Bender, A., Schneider, N., Segler, M., Patrick Walters, W., Engkvist, O., Rodrigues, T.: Evaluation guidelines for machine learning tools in the chemical sciences. Nat. Rev. Chem. 6(6), 428–442 (2022)

    Article  Google Scholar 

  5. Corani, G., Benavoli, A.: A bayesian approach for comparing cross-validated algorithms on multiple data sets. Mach. Learn. 100(2–3), 285–304 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  7. Fagerland, M.W.: t-tests, non-parametric tests, and large studies-a paradox of statistical practice? BMC Med. Res. Methodol. 12(1), 1–7 (2012)

    Article  Google Scholar 

  8. Hair, J.F., Jr., Sarstedt, M.: Data, measurement, and causal inferences in machine learning: opportunities and challenges for marketing. J. Market. Theory Practice 29(1), 65–77 (2021)

    Article  Google Scholar 

  9. Hopkins, M., Reeber, E., Forman, G., Suermondt, J.: Spambase. UCI Machine Learning Repository (1999). https://doi.org/10.24432/C53G6X

    Article  Google Scholar 

  10. Janosi, A., Steinbrunn, W., Pfisterer, M., Detrano, R., M.D., M.: Heart Disease. UCI Machine Learning Repository (1988). https://doi.org/10.24432/C52P4X

  11. Kim, T.K.: T test as a parametric statistic. Korean J. Anesthesiol. 68(6), 540–546 (2015)

    Article  MathSciNet  Google Scholar 

  12. Morettin, P.A., Bussab, W.O.: Estatística básica. Saraiva Educação SA (2017)

    Google Scholar 

  13. Moro, S., Rita, P., Cortez, P.: Bank Marketing. UCI Machine Learning Repository (2012). https://doi.org/10.24432/C5K306

    Article  Google Scholar 

  14. Trawiński, B., Smetek, M., Telec, Z., Lasota, T.: Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int. J. Appl. Math. Comput. Sci. 22(4), 867–881 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Van Rijsbergen, C.J.: Information retrieval. (No Title) (1979)

    Google Scholar 

  16. Virtanen, P., et al.: SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in python. Nature Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2

  17. Wong, T.T., Yeh, P.Y.: Reliable accuracy estimates from k-fold cross validation. IEEE Trans. Knowl. Data Eng. 32(8), 1586–1594 (2019)

    Article  Google Scholar 

  18. Yeh, I.C.: default of credit card clients. UCI Mach. Learn. Repository (2016). https://doi.org/10.24432/C55S3H

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by CNPq - National Council for Scientific and Technological Development, CAPES - Coordination for the Improvement of Higher Education Personnel and UFOP - Federal University of Ouro Preto.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gonçalves, M.C., Silva, R. (2023). The Effect of Statistical Hypothesis Testing on Machine Learning Model Selection. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45389-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45388-5

  • Online ISBN: 978-3-031-45389-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics