Combinando Abordagens de Balanceamento de Réplicas Proativo e Reativo no HDFS
Abstract
As new data is loaded into the system, it is common for the distribution of the replicas among the nodes to become unbalanced. HDFS Balancer is the standard solution for data balancing by rearranging the replicas already stored in the cluster. However, its current balancing operation has manual dependency and does not consider the specific needs of applications in the cluster. To address such limitations, this work exploits a balancing solution that combines proactive and reactive approaches, acting both in the pre-operational stage and during the execution of the HDFS Balancer. The evaluation results demonstrate that the solution improves performance through replica rearrangement while considering reliability and availability attributes.
References
Cloudera, Inc. (2021). Managing data storage. [link]. Março.
Dai, W., Ibrahim, I., and Bassiouni, M. (2017). An improved replica placement policy for hadoop distributed file system running on cloud platforms. In 4th Int. Conf. on Cyber Security and Cloud Computing (CSCloud), pages 270–275, New York. IEEE.
Dharanipragada, J., Padala, S., Kammili, B., and Kumar, V. (2017). Tula: A disk latency aware balancing and block placement strategy for hadoop. In Big Data (Big Data), 2017 IEEE International Conference on, pages 2853–2858, Boston. IEEE.
Fazul, R. and Barcelos, P. (2020). O apache zookeeper como estratégia de monitoramento ativo para manter o balanceamento de réplicas no hdfs. In Anais do XXI Workshop de Testes e Tolerância a Falhas, pages 1–14, Porto Alegre, RS, Brasil. SBC.
Fazul, R. and Barcelos, P. P. (2019). Política customizada de balanceamento de réplicas para o hdfs balancer do apache hadoop. In Anais do XX Workshop de Testes e Tolerância a Falhas, pages 90–103, Porto Alegre, RS, Brasil. SBC.
Fazul, R. W. A. and Barcelos, P. P. (2021). Automation and prioritization of replica balancing in hdfs. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, SAC ’21, page 235–238, New York, NY, USA. ACM.
Fazul, R. W. A. and Barcelos, P. P. (2023). Prbp: A prioritized replica balancing policy for hdfs balancer. Software: Practice and Experience, 53(3):600–630.
Foundation, A. S. (2022). “HDFS Architecture”. [link]. Fevereiro.
Haloi, S. (2015). Apache Zookeeper Essentials. Packt Publishing Ltd, 1 edition.
Liu, Z., Hua, W., Liu, X., Liang, D., Zhao, Y., and Shi, M. (2021). An efficient group-based replica placement policy for large-scale geospatial 3d raster data on hadoop. Sensors, 21(23):8132.
Shah, A. and Padole, M. (2018). Load balancing through block rearrangement policy for hadoop heterogeneous cluster. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), pages 230–236, Bangalore. IEEE.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The hadoop distributed file system. In Symposium on Mass Storage Systems and Technologies, pages 1–10, Incline Village. IEEE.
Turkington, G. (2013). Hadoop Beginner’s Guide. Packt Publishing Ltd, 1 edition.
White, T. (2015). Hadoop: The Definitive Guide. O’Reilly Media, Inc., 4 edition.