Addressing Data-Intensive Computing Problems with the Use of MapReduce on Heterogeneous Environments as Desktop Grid on Slow Links
Resumo
The emergence of data volumes in the order of petabytes creates the need to develop new solutions that make possible the processing of data through the use of intensive computing systems, as MapReduce. MapReduce is a programming framework where the programmer is abstracted from the parallelization process. However, this model is optimized primarily in large clusters and it has a low performance on heterogeneous environments, with computational capacity machines different. The motivation of this work is to apply the data-intensive computing heterogeneous environments as desktop grid with use MapReduce model. Due to deficiencies of the MapReduce model in heterogeneous environments it was proposed the MR-A++: a MapReduce with algorithms adapted to heterogeneous environments. The MR-A++ model creates a training task to gather information prior to the distribution of data. Therefore the information will be used to manager the system. The small delay introduced in phase of setup of computing is compensated with the adequacy of heterogeneous environment through computational capacity of the machines. So the performance gains can be greater than 70% at 10 Mbps.
Palavras-chave:
Computational modeling, Adaptation models, Software, Programming, Data models, Training, Delay
Publicado
17/10/2012
Como Citar
ANJOS, Julio C. S.; KOLBERG, Wagner; GEYER, Claudio R.; ARANTES, Luciana B..
Addressing Data-Intensive Computing Problems with the Use of MapReduce on Heterogeneous Environments as Desktop Grid on Slow Links. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 13. , 2012, Petrópolis.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2012
.
p. 148-155.