Combining semi-supervision and hubness to improve clustering of high-dimensional data

  • Mateus C. de Lima Federal University of Uberlândia
  • Maria Camila N. Barioni Federal University of Uberlândia
  • Humberto L. Razente Federal University of Uberlândia

Abstract


The curse of dimensionality turns the high-dimensional data analysis a challenging task for data clustering techniques. In order to deal with high-dimensional data, this paper presents a clustering approach that explores the combination of two strategies: semi-supervision and density estimation based on hubness scores. Initial experimental results show a good performance when applied on real data sets with different characteristics.
Keywords: Semi-Supervision, Hubness, Data Clustering, High Dimension

References

Basu, S., Davidson, I., and Wagstaff, K. (2008). Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC, 1 edition.

Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1–30.

Dhillon, I. S., Guan, Y., and Kulis, B. (2004). Kernel k-means: Spectral clustering and normalized cuts. KDD ’04, pages 551–556. ACM.

Faceli, K., Lorena, A. C., Gama, J. a., and Carvalho, A. (2011). Inteligência Artificial: Uma Abordagem de Aprendizado de Máquina. LTC, 1 edition.

Samet, H. (2005). Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann.

Sander, J., Ester, M., Kriegel, H.-P., and Xu, X. (1998). Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Min. Knowl. Discov., 2(2):169–194.

Silvestre, A. L. (2007). Análise de Dados e Estatística Descritiva. Escolar Editora.

Tomasev, N. and Mladenic, D. (2013). Hub co-occurrence modeling for robust high-dimensional knn classification. In ECML PKDD, pages 643–659. Springer.

Tomasev, N., Radovanovic, M., Mladenic, D., and Ivanovic, M. (2011). The role of hubness in clustering high-dimensional data. PAKDD, pages 183–195. Springer.

Tomasev, N., Radovanovic, M., Mladenic, D., and Ivanovic, M. (2014). The role of hubness in clustering high-dimensional data. IEEE TKDE, 26(3):739–751.

Zar, J. H. (2007). Biostatistical Analysis. Prentice-Hall, Inc., 5 edition.
Published
2016-10-04
DE LIMA, Mateus C.; BARIONI, Maria Camila N.; RAZENTE, Humberto L.. Combining semi-supervision and hubness to improve clustering of high-dimensional data. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 31. , 2016, Salvador/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 139-144. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2016.24318.