Impact of uniform random sampling to increase scalability in the generation of hierarchical clusters of spatiotemporal series

  • Rodolfo M. S. Mendes Federal University of Uberlândia
  • Humberto Razente Federal University of Uberlândia
  • Maria Camila N. Barioni Federal University of Uberlândia
  • Luciana Alvim Santos Romani Embrapa Agricultural Informatics

Abstract


This paper presents the results of a scalable approach to build hierarchical clustering from space-time series. The goal is to reduce the complexity in terms of space and time. The approach explores data sampling pre-processing techniques to reduce the numerosity of the data. The experiment indicates it is needed the development of more efficient strategies than the naive selection of samples (uniform sampling).
Keywords: Hierarchical clustering, Space-time series

References

Bones, C., Romani, L., and Sousa, E. (2016). Improving multivariate data streams clustering. Procedia Computer Science, 80:461 – 471.

Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell., 1(2):224–227.

Dias, T. L., Câmara, G., and A. Davis Jr., C. (2005). Bancos de Dados Geográficos, capítulo Modelos espaço-temporais, páginas 137–169. MundoGEO.

Ding, R., Wang, Q., Dang, Y., Fu, Q., Zhang, H., and Zhang, D. (2015). YADING: Fast Clustering of Large-Scale Time Series Data. VLDB Endowment, 8(5):473–484.

Dunn, J. C. (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J. Cybern., (3):32–57.

García, S., Luengo, J., and Herrera, F. (2015). Data Preprocessing in Data Mining, capítulo Data Reduction, páginas 147–162. Springer.

Guha, S., Rastogi, R., and Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. In ACM SIGMOD Record, volume 27, páginas 73–84. ACM.

Han, J., Kamber, M., and Pei, J. (2011). Data Mining: Concepts and Techniques. Morgan Kaufmann.

Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recogn. Lett., 31(8):651–666.

Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Statistics. Wiley.

Meek, C., Thiesson, B., and Heckerman, D. (2002). The learning-curve sampling method applied to model-based clustering. Journal of Machine Learning Research, 2:397–418.

Ng, R. T. and Han, J. (2002). Clarans: A method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. (TKDE), 14(5):1003–1016.

Zhang, T., Ramakrishnan, R., and Livny, M. (1997). Birch: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1(2):141–182.
Published
2016-10-04
MENDES, Rodolfo M. S.; RAZENTE, Humberto; BARIONI, Maria Camila N.; ROMANI, Luciana Alvim Santos. Impact of uniform random sampling to increase scalability in the generation of hierarchical clusters of spatiotemporal series. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 31. , 2016, Salvador/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 193-198. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2016.24327.