Um Estudo Comparativo entre Algoritmos de Agrupamentos de Dados Usando a Ferramenta YADMT

  • Narciso F. Sousa UFRN
  • Flavius L. Gorgônio UFRN
  • Huliane M. Silva IFRN

Abstract


The need to transform data into information and information into knowledge led to the emergence of the data mining area, whose objective is to provide techniques for interpreting large volumes of data. Although current computational tools for analyzing and processing information can analyze huge volumes of data in a matter of seconds, real-world applications tend to be much more complex and have much more challenging databases than those commonly presented in the literature. This work presents a comparative study between data clustering algorithms from Fundamental Clustering Problem Suite (FCPS) databases and the Yet Another Data Mining Tool (YADMT), which simulate various situations present in real world problems. The algorithms chosen in this research were: ant colony, k-means, self-organizing maps and hierarchical methods. For their evaluation, the F-Measure, the R-Index and the Intra-groups Variance were used.

References

Boscarioli, C., Teixeira, M. F., Villwock, R., and Faino, T. M. (2013). O módulo de agrupamento de dados da ferramenta yadmt. V EPAC Enc. Paranaense de Computação.

Deneubourg, J.-L., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C., and Chrétien, L. (1990). The dynamics of collective sorting robot-like ants and ant-like robots. In Proc of the Int Conference on Simulation of Adaptive Behavior, pages 356–363.

Faino, T. M. (2013). Agrupamento de dados a partir de mapas auto-organizáveis na ferramenta yadmt. CCET, UNIOESTE, PR.

Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3):37–54.

Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., and Tatham, R. L. (2009). Análise multivariada de dados. Bookman editora.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining: Concepts and Techniques, Third Edition, volume 3rd. Morgan Kaufmann Publishers, Waltham, Mass.

Handl, J., Knowles, J., and Dorigo, M. (2003). On the performance of ant-based clustering. In On the Performance of Ant-based Clustering, volume 104, pages 204–213.

Knob, A. A. (2015). Formas de mapeamento do problema cash para agrupamento de dados. CCET, UNIOESTE, PR.

Kohonen, T. and Honkela, T. (2007). Kohonen network. Scholarpedia, 2(1):1568. revision 127841.

Konen, W., Koch, P., Flasch, O., Bartz-Beielstein, T., Friese, M., and Naujoks, B. (2011). Tuned data mining: a benchmark study on different tuners. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pages 1995–2002.

Linden, R. (2009). Técnicas de agrupamento. Revista de Sistemas de Informação da FSMA, 1.

MacQueen, J. et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA.

Narayanan, R., Ozisikyilmaz, B., Zambreno, J., Memik, G., and Choudhary, A. (2006). Minebench: A benchmark suite for data mining workloads. In 2006 IEEE International Symposium on Workload Characterization, pages 182–188.

Padilha, V. A. and Carvalho, A. C. P. L. F. (2017). Mineração de dados em python. In Instituto de Ciências Matemáticas e de Computação da Universidade de São Paulo.

Teixeira, M. F. (2013). Agrupamento e visualização de dados: Estudo e implementações para a ferramenta yadmt. Centro de Ciências Exatas e Tecnológicas da Universidade Estadual do Oeste do Paraná.

Ultsch, A. (2005). Clustering wih som. In Proc. Workshop on Self-Organizing Maps.

Ultsch, A. and Lötsch, J. (2020). The fundamental clustering and projection suite (fcps): A dataset collection to test the performance of clustering and data projection algorithms. Data, 5(1).

Villwock, R. (2009). Técnivas de agrupamento e de hierarquização no contexto de kdd aplicação a dados temporais de instrumentação geotécnica-estrututal da usina hidrelétrica de itaipu. PPGMNE, UFPR.
Published
2021-09-14
SOUSA, Narciso F.; GORGÔNIO, Flavius L.; SILVA, Huliane M.. Um Estudo Comparativo entre Algoritmos de Agrupamentos de Dados Usando a Ferramenta YADMT. In: REGIONAL SCHOOL ON COMPUTING OF CEARÁ, MARANHÃO, AND PIAUÍ (ERCEMAPI), 9. , 2021, Quixadá/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 82-90. DOI: https://doi.org/10.5753/ercemapi.2021.17911.