A Study of Variants of the Silhouette Validation Index
Abstract
This paper aims to evaluate five variants of the silhouette index for their ability to detect good quality solutions to clustering problems. Five computational experiments were carried out, covering 51 diversified databases (natural and artificial). As dissimilarity measures, Euclidean and Manhattan distances were used, and for clustering algorithms PAM, DBSCAN, and Bisecting k-means. The results obtained indicate that the median-based variant is a good alternative to detect quality solutions.
References
Bussab, W. O., Miazaki, E. S., and Andrade, D. F. (1990). Introdução à Análise de Agrupamentos. IME - USP, São Paulo.
Han, J., Kamber, M., and Pei, J. (2012). Data Mining: Concepts and Techniques: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science.
Hruschka, E. R., Campello, R. J. G. B., and Castro, L. N. (2004). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. In IEEE International Conference on Data Mining, pages 403–406.
Kaufman, L. and Rousseeuw, P. J. (1989). Finding Groups in Data - An Introduction to Clusters Analysis. Wiley-Interscience Publication.
Semaan, G. S. (2013). Algoritmos para o Problema de Agrupamento Automático. Tese de doutorado, Universidade Federal Fluminense, Niterói - RJ.
