Combining Semi-supervision and Hubness to Enhance High-dimensional Data Clustering

Authors

  • Mateus C. de Lima Universidade Federal de Uberlândia
  • Maria Camila N. Barioni Universidade Federal de Uberlândia
  • Humberto L. Razente Universidade Federal de Uberlândia

DOI:

https://doi.org/10.5753/jidm.2017.1621

Keywords:

Data Mining, High-dimensional Data Analysis, Hubness, Semi-Supervised Clustering

Abstract

The curse of dimensionality turns the high-dimensional data analysis a challenging task for data clustering techniques. Recent works have efficiently employed an aspect inherent to high-dimensional data in the proposal of clustering approaches guided by hubs which provide information about the distribution of the data instances among the K-nearest neighbors. Though, hubs can not well reflect the implicit data semantics, leading to an unsuitable data partition. In order to cope with both issues (i.e., high-dimensional data and meaningful clusters), this paper presents a clustering approach that explores the combination of two strategies: semi-supervision and density estimation based on hubness scores.
The experimental results conducted with 23 real datasets show that the proposed approach has a superior performance when applied on datasets with different characteristics.

Downloads

Download data is not yet available.

Downloads

Published

2017-12-08

How to Cite

de Lima, M. C., Barioni, M. C. N., & Razente, H. L. (2017). Combining Semi-supervision and Hubness to Enhance High-dimensional Data Clustering. Journal of Information and Data Management, 8(3), 223. https://doi.org/10.5753/jidm.2017.1621

Issue

Section

SBBD 2016