A practical analysis of balancing policies for rearranging data replicas in HDFS clusters

  • Rhauani Weber Aita Fazul UFSM
  • Patrícia Pitthan Barcelos UFSM


Data replication is the main fault tolerance mechanism implemented by the HDFS. The placement of the replicated data across the nodes directly influences replica balancing and data locality, which are essential to ensure high reliability and data availability. The HDFS Balancer is the official solution to perform replica balancing through data redistribution. In this work, we conducted a practical experiment to evaluate different policies for replica rearrangement, namely: datanode, blockpool, and custom. The evaluation results underline the behavior and the effectiveness of each policy. In addition, we investigated the cost of the HDFS Balancer operation and the performance and availability improvements promoted by a balanced replica distribution.


FAZUL, Rhauani Weber Aita; BARCELOS, Patrícia Pitthan. A practical analysis of balancing policies for rearranging data replicas in HDFS clusters. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (WSCAD), 23. , 2022, Florianópolis. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 121-132. DOI: https://doi.org/10.5753/wscad.2022.225856.