Dimension Reduction for Projective Clustering

Rafael Zuolo Coppini Lima

doi:10.5753/etc.2022.223040

Rafael Zuolo Coppini Lima USP

DOI: https://doi.org/10.5753/etc.2022.223040

Resumo

The high dimensionality of data may be a barrier to algorithmic efficiency, mainly because of the well known “curse of dimensionality” that imposes exponential time and/or memory complexity for algorithms. It is natural then to search for ways to break this curse by relaxing the problem with approximate versions and by finding good ways to reduce the dimension of data. Our aim is to integrate and state slightly stronger results of approximation via dimension reduction for clustering under the l_2^2 metric. The dimension reduction is achieved by combining randomized techniques (the Johnson-Lindenstrauss Lemma) and deterministic techniques (the singular value decomposition).

Palavras-chave: Clustering, Projective Clustering, Dimension reduction, Approximation, Singular value decomposition, Johnson-Lindenstrauss lemma

Referências

Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671–687.

Aloise, D., Deshpande, A., Hansen, P., and Popat, P. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245–248.

Feldman, D., Schmidt, M., and Sohler, C. (2020). Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. SIAM J. Comput., 49(3):601–657.

Johnson, W. B. and Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, Conn., 1982), volume 26 of Contemp. Math., pages 189–206. Amer. Math. Soc., Providence, RI.

Megiddo, N. and Tamir, A. (1982). On the complexity of locating linear facilities in the plane. Operations Research Letters, 1(5):194–197.

Pratap, R. and Sen, S. (2018). Faster coreset construction for projective clustering via low-rank approximation. In International Workshop on Combinatorial Algorithms, pages 336–348. Springer.

Sarlós, T. (2006). Improved approximation algorithms for large matrices via random projections. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 143–152. IEEE.