HyperViewStamped Replication: Uma Estratégia para Replicação Distribuída Hierárquica

Gabriela Stein; Luiz Antonio Rodrigues; Elias P. Duarte Jr.

doi:10.5753/wtf.2025.8732

Gabriela Stein UNIOESTE / UFPR
Luiz Antonio Rodrigues UNIOESTE
Elias P. Duarte Jr. UFPR

DOI: https://doi.org/10.5753/wtf.2025.8732

Resumo

Este artigo descreve uma versão do algoritmo clássico de replicação distribuída Viewstamped (VS) Replication que utiliza o VCube. O algoritmo proposto, denominado HyperViewStamped (HVS) Replication, representa uma versão hierárquica para replicação distribuída, que utiliza o VCube para dois propósitos: como detector de falhas e para a difusão de mensagens. O VCube é uma topologia virtual que conecta os processos do sistema distribuído oferecendo diversas propriedades logarítmicas, sendo escalável por definição. O algoritmo HVS utiliza uma réplica primária que comunica com clientes e garante a ordem total. Quando o primário é suspeito de ter falhado, é iniciada uma troca de visão, cada visão corresponde a um primário distinto. São apresentadas versões hierárquicas tanto para o funcionamento normal, como para a troca de visões. O algoritmo proposto foi avaliado através de simulação e comparado com a versão tradicional. Resultados mostram que em cenários normais ambos os algoritmos apresentam desempenho semelhante. Apesar do tempo para a troca de visão ser inferior no algoritmo VS, o número de mensagens do detector de falhas do HVS é menor.

Palavras-chave: algoritmos distribuídos, comunicação em grupo, multicast e roteamento, tolerância a falhas

Referências

Birman, K. (2010). A history of the virtual synchrony replication model. In Replication: Theory and Practice, 91–120. Springer.

Birman, K., & Joseph, T. (1987). Exploiting virtual synchrony in distributed systems. In Proceedings of the 11th ACM Symposium on Operating Systems Principles, 123–138.

Chandra, T. D., & Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM (JACM), 43(2): 225–267.

Charron-Bost, B., Pedone, F., & Schiper, A. (2010). Replication. Springer.

de Araujo, J. P., Arantes, L., Duarte Jr, E. P., Rodrigues, L. A., & Sens, P. (2019). Vcubeps: A causal broadcast topic-based publish/subscribe system. Journal of Parallel and Distributed Computing, 125, 18–30.

Duarte, E., Bona, L., & Ruoso, V. (2014). VCube: A provably scalable distributed diagnosis algorithm. In 5th ScalA Workshop, 17–22.

Duarte, E. P., & De Bona, L. E. (2002). A dependable SNMP-based tool for distributed network management. In Proceedings of the International Conference on Dependable Systems and Networks, 279–284. IEEE.

Duarte Jr, E. P., Rodrigues, L. A., Camargo, E. T., & Turchetti, R. C. (2023). The missing piece: A distributed system-level diagnosis model for the implementation of unreliable failure detectors. Computing, 105(12): 2821–2845.

Dwork, C., Lynch, N., & Stockmeyer, L. (1988). Consensus in the presence of partial synchrony. Journal of the ACM (JACM), 35(2): 288–323.

Freitas, A. E. S., Rodrigues, L. A., & Duarte Jr, E. P. (2024). Vcubechain: A scalable permissioned blockchain. Ad Hoc Networks, 158, 103461.

Jeanneau, D., Rodrigues, L. A., Arantes, L., & Duarte Jr, E. P. (2017a). An autonomic hierarchical reliable broadcast protocol for asynchronous distributed systems with failure detection. Journal of the Brazilian Computer Society, 23, 1–14.

Jeanneau, D., Rodrigues, L. A., Arantes, L., & Duarte Jr, E. P. (2017b). An autonomic hierarchical reliable broadcast protocol for asynchronous distributed systems with failure detection. Journal of the Brazilian Computer Society, 23(1): 1–14.

Liskov, B. (2010). From viewstamped replication to Byzantine fault tolerance. In Replication: Theory and Practice, 121–149. Springer.

Liskov, B., & Cowling, J. (2012). Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT.

Nassu, B. T., Duarte Jr, E. P., & Ramirez Pozo, A. T. (2005). A comparison of evolutionary algorithms for system-level diagnosis. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, 2053–2060.

Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm. In USENIX Annual Technical Conference (USENIX ATC 14), 305–319.

Pradhan, D. K., et al. (1996). Fault-tolerant computer system design, volume 132. Prentice-Hall Englewood Cliffs.

Reynal, M. (2005). A short introduction to failure detectors for asynchronous distributed systems. ACM SIGACT News, 36(1):53–70.

Rodrigues, L. A., Arantes, L., and Jr., E. P. D. (2014a). An autonomic implementation of reliable broadcast based on dynamic spanning trees. In EDCC, pages 1–12. IEEE Computer Society.

Rodrigues, L. A., Duarte, E. P., de Araujo, J. P., Arantes, L., and Sens, P. (2018a). Bundling messages to reduce the cost of tree-based broadcast algorithms. In 8th LADC, pages 115–124. IEEE.

Rodrigues, L. A., Duarte Jr, E. P., and Arantes, L. (2014b). Árvores geradoras mínimas distribuídas e autonômicas. SBRC, pages 1–14.

Rodrigues, L. A., Duarte Jr, E. P., and Arantes, L. (2018b). A distributed k-mutual exclusion algorithm based on autonomic spanning trees. Journal of Parallel and Distributed Computing, 115:41–55.

Ruchel, L. V., de Camargo, E. T., Rodrigues, L. A., Turchetti, R. C., Arantes, L., and Duarte Jr, E. P. (2024). Scalable atomic broadcast: A leaderless hierarchical algorithm. Journal of Parallel and Distributed Computing, 184:104789.

Ruchel, L. V., Rodrigues, L. A., Turchetti, R. C., Arantes, L., Duarte Jr, E. P., and Camargo, E. T. (2022). A leaderless hierarchical atomic broadcast algorithm. In Proceedings of the 11th Latin-American Symposium on Dependable Computing (LADC), pages 61–66.

Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR), 22(4):299–319.

Stein, G., Rodrigues, L. A., Duarte Jr, E. P., and Arantes, L. (2023). Diamond-p-vcube: An eventually perfect hierarchical failure detector for asynchronous distributed systems. In Proc. 12th Latin-American Symposium on Dependable and Secure Computing, pages 40–49.

Urban, P., Defago, X., and Schiper, A. (2001). Neko: a single environment to simulate and prototype distributed algorithms. In Proceedings 15th International Conference on Information Networking, pages 503–511.

Van Renesse, R. and Altinbuken, D. (2015). Paxos made moderately complex. ACM Computing Surveys (CSUR), 47(3):1–36.