Dynamic Architecture for Data Replica Balancing in HDFS: Stability, Efficiency, and Data Locality Evaluations

  • Rhauani Weber Aita Fazul UFSC
  • Odorico Machado Mendizabal UFSC
  • Patrícia Pitthan Barcelos UFSM

Resumo


Hadoop Distributed File System (HDFS) is known for its specialized strategies and policies tailored to enhance replica placement. This capability is critical for ensuring efficient and reliable access to data replicas, particularly as HDFS operates best when data are evenly distributed within the cluster. In this study, we conduct a thorough analysis of the replica balancing process in HDFS, focusing on two critical performance metrics: stability and efficiency. We evaluated these balancing aspects by contrasting them with conventional HDFS solutions and employing a novel dynamic architecture for data replica balancing. On top of that, we delve into the optimizations in data locality brought about by effective replica balancing and their benefits for data-intensive applications.

Referências

Cloudera, Inc. (2021). Managing data storage. [link]. November.

Dai, W., Ibrahim, I., and Bassiouni, M. (2017). An improved replica placement policy for Hadoop Distributed File System running on cloud platforms. In 4th Int. Conf. on Cyber Security and Cloud Computing (CSCloud), pages 270–275, New York. IEEE.

Dharanipragada, J., Padala, S., Kammili, B., and Kumar, V. (2017). Tula: A disk latency aware balancing and block placement strategy for Hadoop. In 2017 IEEE International Conference on Big Data, pages 2853–2858, Boston. IEEE.

Fazul, R. W. A. and Barcelos, P. P. (2021). Automation and Prioritization of Replica Balancing in HDFS. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, SAC ’21, page 235–238, New York, NY, USA. ACM.

Fazul, R. W. A. and Barcelos, P. P. (2022a). An Event-Driven Strategy for Reactive Replica Balancing on Apache Hadoop Distributed File System. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC ’22.

Fazul, R. W. A. and Barcelos, P. P. (2022b). The HDFS Replica Placement Policies: A Comparative Experimental Investigation. In Distributed Applications and Interoperable Systems, pages 151–166, Cham. Springer International Publishing.

Fazul, R. W. A. and Barcelos, P. P. (2023). PRBP: A prioritized replica balancing policy for HDFS balancer. Software: Practice and Experience, 53(3):600–630.

Foundation, A. S. (2023). Apache Hadoop – HDFS Architecture. [link]. October.

Haloi, S. (2015). Apache Zookeeper Essentials. Packt Publishing Ltd, 1 edition.

Joshi, B. Y., Sawai, D., et al. (2022). Performance Tuning Of Apache Spark Framework In Big Data Processing with Respect To Block Size And Replication Factor. SAMRIDDHI: A Journal of Physical Sciences, Engineering and Technology, 14(02):152–158.

Liu, Z., Hua, W., Liu, X., Liang, D., Zhao, Y., and Shi, M. (2021). An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop. Sensors, 21(23):8132.

Shah, A. and Padole, M. (2018). Load Balancing through Block Rearrangement Policy for Hadoop Heterogeneous Cluster. In International Conference on Advances in Computing, Communications and Informatics, pages 230–236, Bangalore. IEEE.

Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The Hadoop Distributed File System. In Symposium on Mass Storage Systems and Technologies.

Shwe, T. and Aritsugi, M. (2018). A data re-replication scheme and its improvement toward proactive approach. ASEAN Engineering Journal, 8(1):36–52.

White, T. (2015). Hadoop: The Definitive Guide. O’Reilly Media, Inc., 4 edition.

Xu, C. and Lau, F. C. (1996). Load balancing in parallel computers: theory and practice, volume 381. Springer Science & Business Media.

Yin, Y. and Deng, L. (2022). A dynamic decentralized strategy of replica placement on edge computing. International Journal of Distributed Sensor Networks, 18(8):9.

Zhang, Q., Zhang, S. Q., Leon-Garcia, A., and Boutaba, R. (2015). Aurora: Adaptive block replication in distributed file systems. In 2015 IEEE 35th International Conference on Distributed Computing Systems, pages 442–451, New York. IEEE, IEEE.
Publicado
20/05/2024
FAZUL, Rhauani Weber Aita; MENDIZABAL, Odorico Machado; BARCELOS, Patrícia Pitthan. Dynamic Architecture for Data Replica Balancing in HDFS: Stability, Efficiency, and Data Locality Evaluations. In: SIMPÓSIO BRASILEIRO DE REDES DE COMPUTADORES E SISTEMAS DISTRIBUÍDOS (SBRC), 42. , 2024, Niterói/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 239-252. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2024.1308.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 > >>