Apache ZooKeeper as an active monitoring strategy for maintaining replica balancing in HDFS
Abstract
Apache ZooKeeper is a scalable and highly reliable service for coordination in distributed environments. Following a shared namespace model based on a znode tree, ZooKeeper presents itself as an efficient solution to actively manage configuration information. In this work, we analyzed a perspective of using ZooKeeper and its znodes as a strategy for maintaining the balance in the distribution of data in HDFS: a distributed file system that operates based on data replication. By monitoring the cluster utilization in real-time, there is no need to manually trigger the execution of the HDFS native balancer, thus automating the decisions regarding the process of replica balancing in the system.
References
Foundation, A. S. (2019). “HDFS Architecture”. hadoop.apache.org/docs/r2. 9.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign. Novembro.
Foundation, A. S. (2020). “ZooKeeper: A Distributed Coordination Service for Dis-tributed Applications”. https://zookeeper.apache.org/doc/r3.6.0/|'zookeeperOver.htmlfch DesignOverview, Janeiro.
Guo, Z., Fox, G., and Zhou, M. (2012). Investigation of data locality in mapreduce. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloudand Grid Computing (ccgrid 2012), pages 419-426, Ottawa. IEEE Computer Society.
Haloi, S. (2015). Apache Zookeeper Essentials. Packt Publishing Ltd, Ist edition.
Hortonworks (2019). “Balancing data across an HDFS cluster”. https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/data-storage/content/balancing_data_across_hdfs_cluster.html. Dezembro.
Ibrahim, I. A., Dai, W., and Bassiouni, M. (2016). Intelligent data placement mechanismfor replicas distribution in cloud storage systems. In IEEE International Conferenceon Smart Cloud (SmartCloud), pages 134-139, New York. IEEE.
Junqueira, F. and Reed, B. (2013). ZooKeeper: Distributed Process Coordination.O'Reilly Media, Inc., Ist edition.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The hadoop distributed filesystem. In Symposium on Mass Storage Systems and Technologies, pages 1-10. IEEE.
Srinivasa, K. and Muppalla, A. K. (2016). Guide to High Performance Distributed Com-puting. Springer, Swindon, Ist edition.
Turkington, G. (2013). Hadoop Beginner's Guide. Packt Publishing Ltd, Ist edition.
White, T. (2015). Hadoop: The Definitive Guide. O"Reilly Media, Inc., 4th edition.
