Data stream clustering using complex networks

  • Danilo A. Nunes Universidade Federal de Uberlândia
  • Murillo G. Carneiro Universidade Federal de Uberlândia

Resumo


O agrupamento em fluxo de dados é uma tarefa de aprendizado de máquina crucial para vários sistemas que geram dados de maneira contínua e carecem de analisá-los ininterruptamente. Através de recursos oferecidos pela plataforma MOA (Massive Online Analysis), a proposta desta pesquisa consiste em aplicar modelos de aprendizado baseados em redes complexas na fase offline do CluStream, na qual micro-grupos são agrupados através do algoritmo kMeans. Para os experimentos desse projeto, foram consideradas três bases de dados e várias medidas de desempenho específicas ao problema. Os resultados mostraram que o uso de redes complexas apresenta desempenho competitivo, chegando a superar o método tradicional k-Means em diversos cenários.

Palavras-chave: Fluxo de Dados, Agrupamento, Aprendizado de Máquina, Complex Networks

Referências

Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. (2003). A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases Volume 29, VLDB ’03, pages 81–92, Berlin, Germany. VLDB Endowment.

Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, M. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12:461–486.

Barabási, A.-L. and Pósfai, M. (2016). Network science. Cambridge University Press, Cambridge, United Kingdom. OCLC: ocn910772793.

Berkhin, P. (2006). Survey of Clustering Data Mining Techniques. page 56.

Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B. (2010). MOA: massive online analysis. J. Mach. Learn. Res., 11:1601–1604.

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. 2008(10):P10008. arXiv: 0803.0476.

Bornholdt, S. and Schuster, H. G., editors (2003). Handbook of graphs and networks: from the genome to the internet. Wiley-VCH, Weinheim, 1st ed edition. OCLC: ocm50056112.

Carneiro, M. G., Cheng, R., Zhao, L., and Jin, Y. (2019). Particle swarm optimization for network-based data classification. Neural Networks, 110:243–255.

Carneiro, M. G., Gama, B. C., and Ribeiro, O. S. (2021). Complex network measures for data classification. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.

Carneiro, M. G. and Zhao, L. (2018). Analysis of graph construction methods in supervised data classification. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pages 390–395. IEEE.

Clauset, A., Newman, M. E. J., and Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6):066111. arXiv:cond-mat/0408187.

Csardi, G. and Nepusz, T. (2005). The Igraph Software Package for Complex Network Research. InterJournal, Complex Systems:1695.

Diestel, R. (2005). Graph theory. Number 173 in Graduate texts in mathematics. Springer, Berlin ; New York, 3rd ed edition.

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5):75–174.

Freitas, L. M. and Carneiro, M. G. (2019). Community detection to invariant pattern clustering in images. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pages 610–615.

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4):1–37.

Gantz, J. and Reinsel, D. (2011). Extracting Value from Chaos. page 12.

Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An Introduction To Cluster Analysis.

Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., and Pfahringer, B. (2011). An effective evaluation measure for clustering on evolving data streams. pages 868–876.

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137.

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill Science/Engineering/Math, New York, 1ª edição edition.

Newman, M. E. J. (2003). The Structure and Function of Complex Networks. SIAM Review, 45(2):167–256.

Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Physical Review E, 69(6):066133. arXiv: cond-mat/0309508.

Nguyen, H.-L., Woon, Y.-K., and Ng, W. K. (2014). A Survey on Data Stream Clustering and Classification. Knowledge and Information Systems, 45.

Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Record, 25(2):103–114.
Publicado
25/09/2023
NUNES, Danilo A.; CARNEIRO, Murillo G.. Data stream clustering using complex networks. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 20. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 1210-1224. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2023.234716.