Efetividade da Política de Posicionamento de Blocos no Balanceamento de Réplicas do HDFS
Resumo
The Hadoop Distributed File System (HDFS) is designed to store and transfer data in large scale. To ensure availability and reliability, it uses data replication as a fault tolerance mechanism. However, this strategy can significantly affect replication balancing in the cluster. This paper provides an analysis of the default data replication policy used by HDFS and measures its impacts on the system behavior, while presenting different strategies for cluster balancing and rebalancing. In order to highlight the required requirements for efficient replica placement, a comparative study of the HDFS performance has been conduct considering a variety of factors that may result in cluster imbalance.
Referências
Ciritoglu, H. E., Batista de Almeida, L., Cunha de Almeida, E., Buda, T. S., Murphy, J., and Thorpe, C. (2018).
Investigation of replication factor for performance enhan-cement in the hadoop distributed file system. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, pages 135-140. ACM.
Dharanipragada, J., Padala, S., Kammili, B., and Kumar, V. (2017). Tula: A disk latency aware balancing and block placement strategy for hadoop. In 2017 IEEE International Conference on Big Data (Big Data), pages 2853-2858. IEEE.
Dinu, F. and Ng, T. (2012). Understanding the effects and implications of compute node related failures in hadoop. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, pages 187-198. ACM.
Foundation, A. S. (2018). "HDFS Architecture". hadoop.apache.org/docs/r2. 9.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign. Janeiro.
Hortonworks (2018). "HDFS Administration". https://docs.hortonworks. com/HDPDocuments/HDP2/HDP-2.6.5/bk_hdfs-administration/ content/ch_balancing-in-hdfs.html. Janeiro.
Ibrahim, I. A., Dai, W., and Bassiouni, M. (2016). Intelligent data placement mechanism for replicas distribution in cloud storage systems. In IEEE International Conference on Smart Cloud (SmartCloud), pages 134-139. IEEE.
Liu, K., Xu, G., and Yuan, J. (2013). An improved hadoop data load balancing algorithm. Journal of Networks, 8(12):2816.
Patole, A., Kumar, S. M., Chandran, P., and Shabeera, T. (2015). Load-aware replica placement in multiuser hadoop environment using mst. In 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pages 376-381. IEEE.
Shah, A. and Padole, M. (2018). Load balancing through block rearrangement policy for hadoop heterogeneous cluster. In 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pages 230-236. IEEE.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The hadoop distributed file system. In Symposium on Mass Storage Systems and Technologies, pages 1-10. IEEE.
Turkington, G. (2013). Hadoop Beginner's Guide. Packt Publishing Ltd, 1st edition.
VishnuVardhan, C. B. and Baruah, P. K. (2016). Improving the performance of heteroge-neous hadoop cluster. In 2016 4th International Conference on Parallel, Distributed and Grid Computing (PDGC), pages 225-230. IEEE.
White, T. (2015). Hadoop: The Definitive Guide. O'Reilly Media, Inc., 4th edition.