Exploring centroids initialization within Deep Convolutional Embedded Clustering

Leonardo Nogueira; Adriane Serapião

doi:10.5753/eniac.2019.9307

Leonardo Nogueira Universidade Estadual Paulista
Adriane Serapião Universidade Estadual Paulista "Júlio de Mesquita Filho"

DOI: https://doi.org/10.5753/eniac.2019.9307

Resumo

Deep clustering uses a deep neural network to learn deep feature representation for performing clustering tasks. In this paper, we explored the Deep Convolutional Embedded Clustering (DCEC) method, which employs a stan- dart clustering method to get initial weight for the neural model training incor- porated to other clustering methods. The original DCEC uses K-Means with Euclidean distance for the clusters center initialization step. We have applied K-Means with Mahalanobis distance instead of Euclidean distance. In order to improve the DCEC performance, we have included the standart K-Harmonic Means clustering algorithm as well, which tries overcome the dependency of the K-Means performance on the clusters center initialization. The Kernel ba- sed K-Harmonic Means was also introduced in this study to reduce the effect of outliers and noise. We evaluated the performance of these clustering appro- aches within DCEC over benchmark image datasets and the results were better than the baseline.

Palavras-chave: Deep Clustering, Deep Learning, Unsupervised Learning, K-Means, K-Harmonic Means

Referências

Goodfellow, I. J., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press, Cambridge, MA, USA.

Guo, X., Gao, L., Liu, X., and Yin, J. (2017a). Improved deep embedded clustering with local structure preservation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, pages 1753–1759. AAAI Press.

Guo, X., Liu, X., Zhu, E., and Yin, J. (2017b). Deep clustering with convolutional autoencoders. In ICONIP.

Hamerly, G. and Elkan, C. (2002). Alternatives to the k-means algorithm that find better clusterings. In Proceedings of the Eleventh International Conference on Information and Knowledge Management, CIKM ’02, pages 600–607, New York, NY, USA. ACM.

Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554.

Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.

Li, F., Qiao, H., and Zhang, B. (2017). Discriminatively boosted image clustering with fully convolutional auto-encoders. Pattern Recognition, 83:161–173.

Li, Q., Mitianoudis, N., and Stathaki, T. (2007). Spatial kernel k-harmonic means clustering for multi-spectral image segmentation. IET Image Processing, 1(2):156–167.

Lloyd, S. P. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129–137.

Tan, P.-N., Steinbach, M., and Kumar, V. (2016). Introduction to Data Mining, (Second Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

Xie, J., Girshick, R., and Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 478–487, New York, New York, USA. PMLR.

Yang, B., Fu, X., Sidiropoulos, N. D., and Hong, M. (2016). Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In ICML.

Zhang, B., Hsu, M., and Dayal, U. (1999). K-harmonic means: A data clustering algorithm. Technical report, Technical Report HPL1999-124, Hewlett-Packard Labs.

Goodfellow, I. J., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press, Cambridge, MA, USA.

Guo, X., Gao, L., Liu, X., and Yin, J. (2017a). Improved deep embedded clustering with local structure preservation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, pages 1753–1759. AAAI Press.

Guo, X., Liu, X., Zhu, E., and Yin, J. (2017b). Deep clustering with convolutional autoencoders. In ICONIP.

Hamerly, G. and Elkan, C. (2002). Alternatives to the k-means algorithm that find better clusterings. In Proceedings of the Eleventh International Conference on Information and Knowledge Management, CIKM ’02, pages 600–607, New York, NY, USA. ACM.

Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554.

Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.

Li, F., Qiao, H., and Zhang, B. (2017). Discriminatively boosted image clustering with fully convolutional auto-encoders. Pattern Recognition, 83:161–173.

Li, Q., Mitianoudis, N., and Stathaki, T. (2007). Spatial kernel k-harmonic means clustering for multi-spectral image segmentation. IET Image Processing, 1(2):156–167.

Lloyd, S. P. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129–137.

Tan, P.-N., Steinbach, M., and Kumar, V. (2016). Introduction to Data Mining, (Second Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

Xie, J., Girshick, R., and Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 478–487, New York, New York, USA. PMLR.

Yang, B., Fu, X., Sidiropoulos, N. D., and Hong, M. (2016). Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In ICML.

Zhang, B., Hsu, M., and Dayal, U. (1999). K-harmonic means: A data clustering algorithm. Technical report, Technical Report HPL1999-124, Hewlett-Packard Labs.