Equivalence between the Area under the Kolmogorov-Smirnov Curve and the Gini Index in the Performance Evaluation of Binary Decisions

  • Paulo J. L. Adeodato Federal University of Pernambuco
  • Sílvio B. Melo Federal University of Pernambuco

Abstract


This paper proposes and proves the important equivalence between the Gini index and the area under the Kolmogorov-Smirnov (KS) distribution curve. The proof’s rationale is similar to that used in the proof of equivalence between AUC_ROC and AUC_KS. But different from that, this one uses a transformation that preserves the 1-to-1 correspondence between the ideal classifier on the KS and Lorenz curve domains. As metrics, this paper proves that the Gini index ratio to that of the ideal classifier is equivalent to the area under the KS curve ratio to that of its ideal classifier. That is Gini_Index_Ratio = AUC_KS_Ratio. This complements the proven equivalence between KS and ROC area metrics extending it to the Gini index.
Keywords: Gini Index, Kolmogorov-Smirnov, Equivalence between metrics

References

Adeodato, P. J. L. e Melo, S. B. (2016) “On the equivalence between Kolmogorov-Smirnov and ROC curve metrics for binary classification”. Cornell University Library ARXIV, 2016arXiv160600496A, https://arxiv.org/abs/1606.00496.

Adeodato, P. J. L. et al. (2008) “The Power of Sampling and Stacking for the PAKDD-2007 Cross-Selling Problem”. Int. Jour. Data War. Mining, 4, pp. 22–31.

Bellù, L. G. e Liberati, P. (2006) “Inequality Analysis – The Gini Index”. Food and Agriculture Organization, United Nations.

Ceriani, L. e Verme, P. (2012) “The origins of the Gini index: extracts from Variabilità e Mutabilità (2012) by Corrado Gini”. J. Econ. Inequal. 10:421–443.

Conover, W. J. (1999) “Practical Nonparametric Statistics”, (3rd ed.), John Wiley & Sons, New York, NY.

Fawcett, T. (2006) “An introduction to ROC analysis”. Patt. Rec. Lett. 27, pp.861–874.

Kolmogorov, A. N. (1933) “Sulla determinazione empirica di una legge di distribuzione”. Giornale dell’Istituto Italiano degli Attuari, 4, pp. 83–91.

Krzanowski, W. J. e Hand, D. J. (2009) “ROC Curves For Continuous Data”, Chapman and Hall/CRC.

Peterson,W.W., Birdsall, T. G. e Fox, W. C. (1954) “The theory of signal detectability”. In: Proc. of the IRE Professional Group on Information Theory 4, pp.171–212.

Provost, F. e Fawcett, T. (2001) “Robust Classification for Imprecise Environments”. Machine Learning Journal, 42 (3), (Mar. 2001), pp. 203–231.

Provost, F. e Fawcett, T. (2013) “Data Science for business”. O ́Reilly Media Inc., Sebastopol, CA.
Published
2016-10-04
ADEODATO, Paulo J. L.; MELO, Sílvio B.. Equivalence between the Area under the Kolmogorov-Smirnov Curve and the Gini Index in the Performance Evaluation of Binary Decisions. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 31. , 2016, Salvador/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 157-162. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2016.24321.