Anthill: a scalable run-time environment for data mining applications

  • R. A. Ferreira UFMG
  • W. Meira UFMG
  • D. Guedes UFMG
  • L. M. A. Drummond UFF
  • B. Coutinho UFMG
  • G. Teodoro UFMG
  • T. Tavares UFMG
  • R. Araujo UFMG
  • G. T. Ferreira UFMG

Resumo


Data mining techniques are becoming increasingly more popular as a reasonable means to collect summaries from the rapidly growing datasets in many areas. However, as the size of the raw data increases, parallel data mining algorithms are becoming a necessity. In this paper, we present a run-time support system that was designed to allow the efficient implementation of data-mining algorithms on heterogeneous distributed environments. We believe that the runtime framework is suitable for a broader class of applications, beyond data mining. We also present a parallelization strategy that is supported by the run-time system. We show scalability results of three different data-mining algorithms that were parallelized using our approach and our run-time support. All applications scale almost linearly up to a large number of nodes.
Palavras-chave: Runtime environment, Data mining, Application software, Clustering algorithms, Algorithm design and analysis, Scalability, Computer science, Costs, Memory, Data analysis
Publicado
24/10/2005
FERREIRA, R. A. et al. Anthill: a scalable run-time environment for data mining applications. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 17. , 2005, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2005 . p. 159-166.