Detectores Perfeitos em Sistemas Distribuídos Não Síncronos

Raimundo José de Araújo Macêdo; Sérgio Gorender

doi:10.5753/wtf.2008.23150

Raimundo José de Araújo Macêdo UFBA
Sérgio Gorender UFBA

DOI: https://doi.org/10.5753/wtf.2008.23150

Resumo

No presente artigo mostramos que um detector perfeito de defeitos P (que suspeita todos os processos que falharam se e somente os mesmos falharam) pode ser implementado num sistema mais fraco que o sistema distribuído síncrono (contrariando uma crença estabelecida). Nesse sentido, introduzimos o sistema síncrono particionado (Spa) que é estritamente mais fraco que o sistema síncrono (em Spa não é sempre possível implementar ações síncronas globais como sincronização interna de relógios). Através da propriedade que definimos como sincronia particionada forte, mostramos como implementar P em Spa. Melhor ainda, mostramos que mesmo que Sincronia Particionada Forte não possa ser garantida, podemos ainda assim tirar proveito das partições síncronas existentes para melhorar a robustez das aplicações de tolerância da falhas, através de um detector parcialmente perfeito, denominado por nós de xP. As propriedades e algoritmos necessários para implementar P e xP são introduzidos no artigo, assim como as provas de correção relacionadas.

Referências

Aurrecoechea, C., Campbell, A. T., and Hauw, L. (1998). A survey of qos architectures. ACM Multimedia Systems Journal, Special Issue on QoS Architecture, 6(3):138–151.

Chandra, T. D. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267.

Charron-Bost, B., Guerraoui, R., and Schiper, A. (2000). Synchronous system and perfect failure detector: solvability and efficiency issues. In Proceedings of the International Conference on Dependable System and Networks, pages 523–532.

Chen, W., Toueg, S., and Aguilera, M. K. (2000). On the quality of service of failure detectors. In Proceedings of the International Conference on Dependable Systems and Networks (ICDSN/FTCS-30), pages 561–580.

Cristian, F. and Fetzer, C. (1999). The timed asynchronous distributed system model. IEEE Transactions on Parallel and Distributed Systems, 10(6):642–657.

Dwork, C., Lynch, N., and Stockmeyer, L. (1988). Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288–323.

Falai, L. and Bondavalli, A. (2005). Experimental evaluation of the qos of failure detectors on wide area network. In Proceedings of the International Conference on Dependable Systems and Networks, DSN, pages 624–633.

Fetzer, C. and Cristian, F. (1996). Fail-awareness in timed asynchronous systems. In Proceedings of the 15th Symposium on Principles of Distributed Computing, pages 314–321.

Fisher, M. J., Lynch, N., and Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382.

Gorender, S. (2005). Um modelo híbrido e adaptativo para sistemas distribuídos tolerantes a falhas. Tese de Doutorado aprovada pelo CIN/UFPE.

Gorender, S., Macêdo, R., and Raynal, M. (2007]). An adaptive programming model for fault-tolerant distributed computing. IEEE Transactions on Dependable and Secure Computing, 4(1):18–31.

Gorender, S. and Macêdo, R. J. d. A. (2002). Um modelo para tolerância a falhas em sistemas distribuídos com qos. In Anais do Simpósio Brasileiro de Redes de Computadores, SBRC 2002, pages 277–292.

Lamport, L., Shostak, R., and Pease, M. (1982). The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382–401.

Lima, F. R. L. and Macêdo, R. J. d. A. (2005). Adapting failure detectors to communication network load fluctuations using snmp and artificial neural nets. In Second Latin-American Symposium on Dependable Computing (LADC2005), Lecture Notes in Computer Science, pages 191 – 205.

Lynch, N. A. (1996). Distributed Algorithms. Morgan Kaufmann Publishers, Inc.

Macêdo, R. (2000). Failure detection in asynchronous distributed systems. In Proc. of II Workshop on Tests and Fault-Tolerance (II WFT 2000), pages 76–81.

Macêdo, R., Gorender, S., and Cunha, P. (2005). The implementation of a qos-based adaptive distributed system model for fault tolerance. In Anais do Simposio Brasileiro de Redes de Computadores (SBRC 2005), pages 827–840.

Mullender, S. (1993). Distributed Systems. Addison-Wesley Pub Co.

Nunes, R. C. and Jansch-Pôrto, I. (2004). Qos of time-out based self-tuned failure detector: the effects of its communication delay predictor and its safety margin. In IEEE Int. Conf. on Dependable System and Network, track on Performance and Dependability Symposium (DSN-PDS), pages 753–761.

Schiper, A. (1997). Early consensus in an asynchronous systems with a weak failure detector. Distributed Computing, 10(3):149–157.

Veríssimo, P. and Casimiro, A. (2002). The timely computing base model and architecture. IEEE Transactions on Computers, 51(8):916–930.