Modeling the Performance of the Hadoop Online Prototype
Resumo
MapReduce is an important paradigm to support modern data-intensive applications. In this paper we address the challenge of modeling performance of one implementation of MapReduce called Hadoop Online Prototype (HOP), with a specific target on the intra-job pipeline parallelism. We use a hierarchical model that combines a precedence model and a queuing network model to capture the intra-job synchronization constraints. We first show how to build a precedence graph that represents the dependencies among multiple tasks of the same job. We then apply it jointly with an approximate Mean Value Analysis (aMVA) solution to predict mean job response time and resource utilization. We validate our solution against a queuing network simulator in various scenarios, finding that our performance model presents a close agreement, with maximum relative difference under 15%.
Palavras-chave:
Time factors, Pipelines, Computational modeling, Parallel processing, Synchronization, Analytical models, Delay, pipeline parallelism, hadoop online prototype, analytical model, task graph, queuing network, simulation
Publicado
26/10/2011
Como Citar
VIANNA, Emanuel; COMARELA, Giovanni; PONTES, Tatiana; ALMEIDA, Jussara; ALMEIDA, Virgilio; WILKINSON, Kevin; KUNO, Harumi; DAYAL, Umeshwar.
Modeling the Performance of the Hadoop Online Prototype. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 23. , 2011, Vitória/ES.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2011
.
p. 152-159.
