Data stream clustering using complex networks

  • Danilo A. Nunes Federal University of Uberlândia
  • Murillo G. Carneiro Federal University of Uberlândia

Abstract


Data stream clustering is a crucial machine learning task for many systems that generate data continuously and require uninterrupted analysis. By adopting resources offered by the MOA (Massive Online Analysis) platform, the objective of this research is to apply learning models based on complex networks in the offline phase of CluStream, in which micro-clusters are grouped using the k-Means algorithm. For the experiments in this project, three databases and several performance measures specific to the problem were considered. The results showed that the use of complex networks has competitive performance, outperform traditional k-Means method in many scenarios.

Keywords: Data Stream, Clustering, Machine Learning, Complex Networks

References

Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. (2003). A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases Volume 29, VLDB ’03, pages 81–92, Berlin, Germany. VLDB Endowment.

Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, M. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12:461–486.

Barabási, A.-L. and Pósfai, M. (2016). Network science. Cambridge University Press, Cambridge, United Kingdom. OCLC: ocn910772793.

Berkhin, P. (2006). Survey of Clustering Data Mining Techniques. page 56.

Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B. (2010). MOA: massive online analysis. J. Mach. Learn. Res., 11:1601–1604.

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. 2008(10):P10008. arXiv: 0803.0476.

Bornholdt, S. and Schuster, H. G., editors (2003). Handbook of graphs and networks: from the genome to the internet. Wiley-VCH, Weinheim, 1st ed edition. OCLC: ocm50056112.

Carneiro, M. G., Cheng, R., Zhao, L., and Jin, Y. (2019). Particle swarm optimization for network-based data classification. Neural Networks, 110:243–255.

Carneiro, M. G., Gama, B. C., and Ribeiro, O. S. (2021). Complex network measures for data classification. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.

Carneiro, M. G. and Zhao, L. (2018). Analysis of graph construction methods in supervised data classification. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pages 390–395. IEEE.

Clauset, A., Newman, M. E. J., and Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6):066111. arXiv:cond-mat/0408187.

Csardi, G. and Nepusz, T. (2005). The Igraph Software Package for Complex Network Research. InterJournal, Complex Systems:1695.

Diestel, R. (2005). Graph theory. Number 173 in Graduate texts in mathematics. Springer, Berlin ; New York, 3rd ed edition.

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5):75–174.

Freitas, L. M. and Carneiro, M. G. (2019). Community detection to invariant pattern clustering in images. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pages 610–615.

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4):1–37.

Gantz, J. and Reinsel, D. (2011). Extracting Value from Chaos. page 12.

Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An Introduction To Cluster Analysis.

Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., and Pfahringer, B. (2011). An effective evaluation measure for clustering on evolving data streams. pages 868–876.

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137.

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill Science/Engineering/Math, New York, 1ª edição edition.

Newman, M. E. J. (2003). The Structure and Function of Complex Networks. SIAM Review, 45(2):167–256.

Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Physical Review E, 69(6):066133. arXiv: cond-mat/0309508.

Nguyen, H.-L., Woon, Y.-K., and Ng, W. K. (2014). A Survey on Data Stream Clustering and Classification. Knowledge and Information Systems, 45.

Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Record, 25(2):103–114.
Published
2023-09-25
NUNES, Danilo A.; CARNEIRO, Murillo G.. Data stream clustering using complex networks. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 20. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 1210-1224. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2023.234716.