Consenso com Recuperação no Modelo Partitioned Synchronous

Sérgio Gorender; Raimundo Macêdo

doi:10.5753/wtf.2010.23092

Sérgio Gorender UFBA
Raimundo Macêdo UFBA

DOI: https://doi.org/10.5753/wtf.2010.23092

Abstract

The partioned synchronous distributed system model has been introduced to take advantage of synchronous partitions of hybrid distributed systems, as such synchronous partitions are implementable in many real scenarios. In this paper we present for the first time a consensus algorithm for proceses that can recover, and its formal proofs, devoted to the partioned synchronous model. The main advantage of the proposed algorithm is that it can tolerate up to n-k process failures in a system with n processes and k synchronous partitions - not all processes need to belong to synchronous partitions. In particular, such robustness is valid even if the majority of processes does not belong to synchronous partitions, which is an advantage in terms of robustness when compared with algorithms for conventional distributed system models.

References

Aguilera, M. K., Chen, W., and Toueg, S. (1998). Failure detection and consensus in the crash-recovery model. In DISC ’98: Proceedings of the 12th International Symposium on Distributed Computing, pages 231–245.

Chandra, T. D. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267.

Cristian, F. and Fetzer, C. (1999). The timed asynchronous distributed system model. IEEE Transactions on Parallel and Distributed Systems, 10(6):642–657.

Dwork, C., Lynch, N., and Stockmeyer, L. (1988). Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288–323.

Fisher, M. J., Lynch, N., and Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382.

Freiling, F. C., Lambertz, C., and Majster-Cederbaum, M. (2008). Easy consensus algorithms for the crash-recovery model. In DISC ’08: Proceedings of the 22nd international symposium on Distributed Computing, pages 507–508, Berlin, Heidelberg. Springer-Verlag.

Gorender, S. and Macêdo, R. J. A. (2002). Um modelo para tolerância a falhas em sistemas distribuídos com qos. In Anais do Simpósio Brasileiro de Redes de Computadores, SBRC 2002, pages 277–292.

Gorender, S., Macêdo, R. J. A., and Raynal, M. (2007). An adaptive programming model for fault-tolerant distributed computing. IEEE Transactions on Dependable and Secure Computing, 4(1):18–31.

Lamport, L. (1998). The part time parliament. ACM Trans. on Computer Systems, 16(2):133–169.

Lamport, L. (2001). Paxos made simple. Distributed Computing Column of ACM SIGACT News, 32(4):51–58.

Lynch, N. A. (1996). Distributed Algorithms. Morgan Kaufmann Publishers, Inc.

Macêdo, R. J. A. (2007). An integrated group communication infraestructure for hybrid real-time distributed systems. In 9th Workshop on Real-Time Systems, pages 81–88.

Macêdo, R. J. A. and Freitas, A. (2009). A generic group communication approach for hybrid distributed systems. (5523(4)):102–115.

Macêdo, R. J. A. and Gorender, S. (2008). Detectores perfeitos em sistemas distribuídos não síncronos. In IX Workshop de Teste e Tolerância a Falhas (WTF 2008), Rio de Janeiro, Brazil.

Macêdo, R. J. A. and Gorender, S. (2009). Perfect failure detection in the partitioned synchronous distributed system model. In Proceedings of the The Fourth International Conference on Availability, Reliability and Security (ARES 2009), IEEE CS Press. To appear in an extended version in Int. Journal of Critical Computer-Based Systems (IJCCBS).

Macêdo, R. J. A., Gorender, S., and Raynal, M. (2005). A qos-based adaptive model for fault-tolerant distributed computing (with an application to consensus). In Proceedings of IEEE/IFIP Int. Conference on Computer Systems and Networks (DNS05), pages 412–421.

Veríssimo, P. and Casimiro, A. (2002). The ti mely computing base model and architecture. IEEE Transactions on Computers, 51(8):916–930.