Apache ZooKeeper as an active monitoring strategy for maintaining replica balancing in HDFS

Abstract


Apache ZooKeeper is a scalable and highly reliable service for coordination in distributed environments. Following a shared namespace model based on a znode tree, ZooKeeper presents itself as an efficient solution to actively manage configuration information. In this work, we analyzed a perspective of using ZooKeeper and its znodes as a strategy for maintaining the balance in the distribution of data in HDFS: a distributed file system that operates based on data replication. By monitoring the cluster utilization in real-time, there is no need to manually trigger the execution of the HDFS native balancer, thus automating the decisions regarding the process of replica balancing in the system.

Keywords: distributed computing, fault tolerance, data replication, replica balancing, apache zookeeper

References

Achari, S. (2015). Hadoop Essentials. Packt Publishing Ltd, Birmingham, Ist edition.

Foundation, A. S. (2019). “HDFS Architecture”. hadoop.apache.org/docs/r2. 9.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign. Novembro.

Foundation, A. S. (2020). “ZooKeeper: A Distributed Coordination Service for Dis-tributed Applications”. https://zookeeper.apache.org/doc/r3.6.0/|'zookeeperOver.htmlfch DesignOverview, Janeiro.

Guo, Z., Fox, G., and Zhou, M. (2012). Investigation of data locality in mapreduce. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloudand Grid Computing (ccgrid 2012), pages 419-426, Ottawa. IEEE Computer Society.

Haloi, S. (2015). Apache Zookeeper Essentials. Packt Publishing Ltd, Ist edition.

Hortonworks (2019). “Balancing data across an HDFS cluster”. https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/data-storage/content/balancing_data_across_hdfs_cluster.html. Dezembro.

Ibrahim, I. A., Dai, W., and Bassiouni, M. (2016). Intelligent data placement mechanismfor replicas distribution in cloud storage systems. In IEEE International Conferenceon Smart Cloud (SmartCloud), pages 134-139, New York. IEEE.

Junqueira, F. and Reed, B. (2013). ZooKeeper: Distributed Process Coordination.O'Reilly Media, Inc., Ist edition.

Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The hadoop distributed filesystem. In Symposium on Mass Storage Systems and Technologies, pages 1-10. IEEE.

Srinivasa, K. and Muppalla, A. K. (2016). Guide to High Performance Distributed Com-puting. Springer, Swindon, Ist edition.

Turkington, G. (2013). Hadoop Beginner's Guide. Packt Publishing Ltd, Ist edition.

White, T. (2015). Hadoop: The Definitive Guide. O"Reilly Media, Inc., 4th edition.
Published
2020-12-07
FAZUL, Rhauani Weber Aita; BARCELOS, Patrícia Pitthan. Apache ZooKeeper as an active monitoring strategy for maintaining replica balancing in HDFS. In: FAULT TOLERANCE WORKSHOP (WTF), 21. , 2020, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 1-14. ISSN 2595-2684. DOI: https://doi.org/10.5753/wtf.2020.12483.