Experimentation and Analysis of Dynamic Checkpoint on Apache Hadoop with Failure Scenarios

  • Paulo Vinicius Cardoso UFSM
  • Patrícia Pitthan Barcelos UFSM

Resumo


The growth of reliability problems on high performance systems has motivated searches for fault tolerance mechanisms. The Apache Hadoop framework, created to store and process large amounts of data, implements Checkpoint and Recovery to help on recovery process of its distributed file system (Hadoop Distributed File System HDFS) in presence of failure. However, once configuration attributes can not be changed at runtime, bad choices may cause performance and reliability problems. This work uses a dynamic configuration mechanism for checkpoint on Hadoop and evaluates its performance on scenarios with induced fault on the master element of HDFS.
Palavras-chave: Cluster computing, Monitoring, Heart beat, File systems, Fault tolerance, Fault tolerant systems, apache hadoop, hdfs, fault tolerance, checkpoint and recovery, dynamic configuration, performance evaluation
Publicado
01/10/2018
CARDOSO, Paulo Vinicius; BARCELOS, Patrícia Pitthan. Experimentation and Analysis of Dynamic Checkpoint on Apache Hadoop with Failure Scenarios. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 19. , 2018, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 170-176.