HyperViewStamped: A Hierarchical Approach for Scalable Distributed Replication

  • Gabriela Stein UNIOESTE / UFPR
  • Luiz Antonio Rodrigues UNIOESTE
  • Elias P. Duarte Jr. UFPR

Abstract


This paper introduces a modified version of the classic Viewstamped Replication (VS) algorithm, termed HyperViewStamped (HVS) Replication, which integrates the VCube framework. The proposed HVS algorithm adopts a hierarchical approach to distributed replication, leveraging VCube for two critical functions: failure detection and message broadcasting. VCube, a virtual topology that interconnects processes in a distributed system, inherently provides logarithmic scalability and efficient network properties. In HVS, a primary replica manages client communication and enforces total order consistency. When the primary is suspected of failure, a view change protocol is triggered, where each ``view'' corresponds to a distinct primary. The algorithm features hierarchical implementations for both normal operation and view change procedures. The performance of HVS was evaluated via simulation and compared to traditional VS Replication. Results demonstrate that in fault-free scenarios, HVS achieves lower replication latency as the number of replicas scales. While the traditional VS algorithm outperforms HVS in view change completion time, HVS significantly reduces the message overhead of its failure detection mechanism.

Keywords: distributed systems, group communication, multicast and routing. fault tolerance

References

Birman, K. (2010). A history of the virtual synchrony replication model. In Replication: Theory and Practice, 91–120. Springer.

Birman, K., & Joseph, T. (1987). Exploiting virtual synchrony in distributed systems. In Proceedings of the 11th ACM Symposium on Operating Systems Principles, 123–138.

Chandra, T. D., & Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM (JACM), 43(2): 225–267.

Charron-Bost, B., Pedone, F., & Schiper, A. (2010). Replication. Springer.

de Araujo, J. P., Arantes, L., Duarte Jr, E. P., Rodrigues, L. A., & Sens, P. (2019). Vcubeps: A causal broadcast topic-based publish/subscribe system. Journal of Parallel and Distributed Computing, 125, 18–30.

Duarte, E., Bona, L., & Ruoso, V. (2014). VCube: A provably scalable distributed diagnosis algorithm. In 5th ScalA Workshop, 17–22.

Duarte, E. P., & De Bona, L. E. (2002). A dependable SNMP-based tool for distributed network management. In Proceedings of the International Conference on Dependable Systems and Networks, 279–284. IEEE.

Duarte Jr, E. P., Rodrigues, L. A., Camargo, E. T., & Turchetti, R. C. (2023). The missing piece: A distributed system-level diagnosis model for the implementation of unreliable failure detectors. Computing, 105(12): 2821–2845.

Dwork, C., Lynch, N., & Stockmeyer, L. (1988). Consensus in the presence of partial synchrony. Journal of the ACM (JACM), 35(2): 288–323.

Freitas, A. E. S., Rodrigues, L. A., & Duarte Jr, E. P. (2024). Vcubechain: A scalable permissioned blockchain. Ad Hoc Networks, 158, 103461.

Jeanneau, D., Rodrigues, L. A., Arantes, L., & Duarte Jr, E. P. (2017a). An autonomic hierarchical reliable broadcast protocol for asynchronous distributed systems with failure detection. Journal of the Brazilian Computer Society, 23, 1–14.

Jeanneau, D., Rodrigues, L. A., Arantes, L., & Duarte Jr, E. P. (2017b). An autonomic hierarchical reliable broadcast protocol for asynchronous distributed systems with failure detection. Journal of the Brazilian Computer Society, 23(1): 1–14.

Liskov, B. (2010). From viewstamped replication to Byzantine fault tolerance. In Replication: Theory and Practice, 121–149. Springer.

Liskov, B., & Cowling, J. (2012). Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT.

Nassu, B. T., Duarte Jr, E. P., & Ramirez Pozo, A. T. (2005). A comparison of evolutionary algorithms for system-level diagnosis. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, 2053–2060.

Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm. In USENIX Annual Technical Conference (USENIX ATC 14), 305–319.

Pradhan, D. K., et al. (1996). Fault-tolerant computer system design, volume 132. Prentice-Hall Englewood Cliffs.

Reynal, M. (2005). A short introduction to failure detectors for asynchronous distributed systems. ACM SIGACT News, 36(1):53–70.

Rodrigues, L. A., Arantes, L., and Jr., E. P. D. (2014a). An autonomic implementation of reliable broadcast based on dynamic spanning trees. In EDCC, pages 1–12. IEEE Computer Society.

Rodrigues, L. A., Duarte, E. P., de Araujo, J. P., Arantes, L., and Sens, P. (2018a). Bundling messages to reduce the cost of tree-based broadcast algorithms. In 8th LADC, pages 115–124. IEEE.

Rodrigues, L. A., Duarte Jr, E. P., and Arantes, L. (2014b). Árvores geradoras mínimas distribuídas e autonômicas. SBRC, pages 1–14.

Rodrigues, L. A., Duarte Jr, E. P., and Arantes, L. (2018b). A distributed k-mutual exclusion algorithm based on autonomic spanning trees. Journal of Parallel and Distributed Computing, 115:41–55.

Ruchel, L. V., de Camargo, E. T., Rodrigues, L. A., Turchetti, R. C., Arantes, L., and Duarte Jr, E. P. (2024). Scalable atomic broadcast: A leaderless hierarchical algorithm. Journal of Parallel and Distributed Computing, 184:104789.

Ruchel, L. V., Rodrigues, L. A., Turchetti, R. C., Arantes, L., Duarte Jr, E. P., and Camargo, E. T. (2022). A leaderless hierarchical atomic broadcast algorithm. In Proceedings of the 11th Latin-American Symposium on Dependable Computing (LADC), pages 61–66.

Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR), 22(4):299–319.

Stein, G., Rodrigues, L. A., Duarte Jr, E. P., and Arantes, L. (2023). Diamond-p-vcube: An eventually perfect hierarchical failure detector for asynchronous distributed systems. In Proc. 12th Latin-American Symposium on Dependable and Secure Computing, pages 40–49.

Urban, P., Defago, X., and Schiper, A. (2001). Neko: a single environment to simulate and prototype distributed algorithms. In Proceedings 15th International Conference on Information Networking, pages 503–511.

Van Renesse, R. and Altinbuken, D. (2015). Paxos made moderately complex. ACM Computing Surveys (CSUR), 47(3):1–36.
Published
2025-05-19
STEIN, Gabriela; RODRIGUES, Luiz Antonio; DUARTE JR., Elias P.. HyperViewStamped: A Hierarchical Approach for Scalable Distributed Replication. In: FAULT TOLERANCE WORKSHOP (WTF), 25. , 2025, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 15-28. ISSN 2595-2684. DOI: https://doi.org/10.5753/wtf.2025.8732.