Iterative Optimum-Path Forest: A Graph-Based Data Clustering Framework
Data clustering is widely recognized as a fundamental technique of paramount importance in pattern recognition and data mining. It is extensively used in many fields of the sciences, business and engineering, covering a broad spectrum of applications. Despite the large number of clustering methods, only a few of them take advantage of optimum connectivity among samples for more effective clustering. In this work, we aim to fill this gap by introducing a novel graph-based data clustering framework, called Iterative Optimum-Path Forest (IOPF), that exploits optimum connectivity for the design of improved clustering methods. The IOPF framework consists of four fundamental components: (i) sampling of a seed set S, (ii) partition of the graph induced from the dataset samples by an Optimum-Path Forest (OPF) rooted at S, (iii) recomputation of S based on the previous graph partition, and, after multiple iterations of the last two steps, (iv) selection of the forest with the lowest total cost across all iterations. IOPF can be regarded as a generalization of the Iterative Spanning Forest (ISF) framework for superpixel segmentation from the image domain to the feature space. Herein, we present four IOPF-based clustering solutions to illustrate distinct choices of its constituent components. These are thereafter employed to address three different problems, namely, unsupervised object segmentation, road network analysis and clustering of synthetic two-dimensional datasets, in order to assess their effectiveness under various graph topologies, and to ascertain their efficacy and robustness when compared to competitive baselines.
A. K. Jain, "Data clustering: 50 years beyond k-means," Pattern recognition letters, vol. 31, no. 8, pp. 651-666, 2010.
J. P. Ortega, M. Del, R. B. Rojas, and M. J. Somodevilla, "Research issues on k-means algorithm: An experimental trial using matlab," in CEUR workshop proceedings: semantic web and new technologies, 2009, pp. 83-96.
D. Pelleg, A. W. Moore et al., "X-means: Extending k-means with efficient estimation of the number of clusters." in Icml, vol. 1, 2000, pp. 727-734.
R. M. Esteves, T. Hacker, and C. Rong, "Competitive k-means, a new accurate and distributed k-means algorithm for large datasets," in 2013 IEEE 5th International Conference on Cloud Computing Technology and Science, vol. 1, 2013, pp. 17-24.
J. P. Papa, A. X. Falcão, and C. T. N. Suzuki, "Supervised pattern classification based on optimum-path forest," International Journal of Imaging Systems and Technology, vol. 19, no. 2, pp. 120-131, 2009. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ima.20188
J. Bragantini, S. B. Martins, C. Castelo-Fernandez, and A. X. Falcão, "Graph-based image segmentation using dynamic trees," in Iberoamerican Congress on Pattern Recognition. Springer, 2018, pp. 470-478.
F. C. Belém, S. J. F. Guimarães, and A. X. Falcão, "Superpixel segmentation using dynamic and iterative spanning forest," IEEE Signal Processing Letters, vol. 27, pp. 1440-1444, 2020.
S. Soor, A. Challa, S. Danda, B. Daya Sagar, and L. Najman, "Iterated watersheds, a connected variation of k-means for clustering gis data," IEEE Transactions on Emerging Topics in Computing, pp. 1-1, 2019.
A. X. Falcão, J. Stolfi, and R. de Alencar Lotufo, "The image foresting transform: theory, algorithms, and applications," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 19-29, 2004.
D. Aparco-Cardenas, P. J. de Rezende, and A. X. Falcão, "Object delineation by iterative dynamic trees," in Iberoamerican Congress on Pattern Recognition, 2021, pp. 1-10.
D. Aparco-Cardenas, P. J. de Rezende, and A. X. Falcão, "Chapter 8 - An iterative optimum-path forest framework for clustering," in Optimum-Path Forest, A. X. Falcão and J. P. Papa, Eds. Academic Press, 2022, pp. 175-216. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9780128226889000165
J. E. Vargas-Muñoz, A. S. Chowdhury, E. B. Alexandre, F. L. Galvão, P. A. Vechiatto Miranda, and A. X. Falcão, "An iterative spanning forest
L. M. Rocha, F. A. Cappabianco, and A. X. Falcão, "Data clustering as an optimum-path forest problem with applications in image analysis," International Journal of Imaging Systems and Technology, vol. 19, no. 2, pp. 50-68, 2009.
K. A. Costa, L. A. Pereira, R. Y. Nakamura, C. R. Pereira, J. P. Papa, and A. X. Falcão, "A nature-inspired approach to speed up optimum-path forest clustering and its application to intrusion detection in computer networks," Information Sciences, vol. 294, pp. 95-108, 2015.
F. A. Cappabianco, A. X. Falcão, C. L. Yasuda, and J. K. Udupa, "Brain tissue mr-image segmentation via optimum-path forest clustering," Computer Vision and Image Understanding, vol. 116, no. 10, pp. 1047-1059, 2012.
A. Echemendía Montero and A. X. Falcão, "A divide-and-conquer clustering approach based on optimum-path forest," in 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 2018, pp. 416-423.
S. Chen, T. Sun, F. Yang, H. Sun, and Y. Guan, "An improved optimum-path forest clustering algorithm for remote sensing image segmentation," Computers & Geosciences, vol. 112, pp. 38-46, 2018.
L. Afonso, A. Vidal, M. Kuroda, A. X. Falcão, and J. P. Papa, "Learning to classify seismic images with deep optimum-path forest," in 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 2016, pp. 401-407.
L. C. Afonso, C. R. Pereira, S. A.Weber, C. Hook, A. X. Falcão, and J. P. Papa, "Hierarchical learning using deep optimum-path forest," Journal of Visual Communication and Image Representation, vol. 71, p. 102823, 2020. framework for superpixel segmentation," IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3477-3489, 2019.
S. Alpert, M. Galun, R. Basri, and A. Brandt, "Image segmentation by probabilistic bottom-up aggregation and cue integration." in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2007.
A. Karduni, A. Kermanshah, and S. Derrible, "A protocol to convert spatial polyline data to network formats and applications to world urban road networks," Scientific data, vol. 3, no. 1, pp. 1-7, 2016.
D. Aparco-Cardenas, "Floresta de Caminhos Ótimos Iterativa: Um Arcabouço para Agrupamento de Dados Baseado em Grafos. MSc thesis, Universidade Estadual de Campinas," Master's thesis, Universidade Estadual de Campinas, 2021.
P. Fränti and S. Sieranoja, "Clustering basic benchmark," http://cs.joensuu.fi/sipu/datasets/, accessed: 2020-09-30.