Modeling the Performance of the Hadoop Online Prototype

Emanuel Vianna; Giovanni Comarela; Tatiana Pontes; Jussara Almeida; Virgilio Almeida; Kevin Wilkinson; Harumi Kuno; Umeshwar Dayal

Emanuel Vianna UFMG
Giovanni Comarela UFMG
Tatiana Pontes UFMG
Jussara Almeida UFMG
Virgilio Almeida UFMG
Kevin Wilkinson Hewlett Packard Laboratories
Harumi Kuno Hewlett Packard Laboratories
Umeshwar Dayal Hewlett Packard Laboratories

Resumo

MapReduce is an important paradigm to support modern data-intensive applications. In this paper we address the challenge of modeling performance of one implementation of MapReduce called Hadoop Online Prototype (HOP), with a specific target on the intra-job pipeline parallelism. We use a hierarchical model that combines a precedence model and a queuing network model to capture the intra-job synchronization constraints. We first show how to build a precedence graph that represents the dependencies among multiple tasks of the same job. We then apply it jointly with an approximate Mean Value Analysis (aMVA) solution to predict mean job response time and resource utilization. We validate our solution against a queuing network simulator in various scenarios, finding that our performance model presents a close agreement, with maximum relative difference under 15%.

Palavras-chave: Time factors, Pipelines, Computational modeling, Parallel processing, Synchronization, Analytical models, Delay, pipeline parallelism, hadoop online prototype, analytical model, task graph, queuing network, simulation