Is P-value<0.05 Enough? Two Case Studies in Classifiers Evaluation

  • Nadine M. Neumann UFF
  • Alexandre Plastino UFF
  • Jony A. Pinto Junior UFF
  • Alex A. Freitas University of Kent

Abstract


A common tool used in the process of comparing classifiers is the statistical significance analysis, performed through the hypothesis test. However, there are many researchers attempting to obtain statistical significance through a blinding evaluating of the p-value<0.05 condition, ignoring important concepts such as the effect size and statistical power. This work highlights possible problems caused by the misuse of the hypothesis test and how the effect size and the statistical power can provide information for a better decision making. Therefore, two case studies applying Student’s t-test and Wilcoxon signed-rank test for the comparison of two classifiers are presented.
Published
2018-10-22
NEUMANN, Nadine M.; PLASTINO, Alexandre; PINTO JUNIOR, Jony A.; FREITAS, Alex A.. Is P-value<0.05 Enough? Two Case Studies in Classifiers Evaluation. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 15. , 2018, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 94-103. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2018.4407.