Uma Nova Abordagem Para Otimizar a Comunicação Entre Detectores de Defeitos

Rogério Turchetti; Raul Ceretta Nunes

doi:10.5753/wscad.2006.18943

Rogério Turchetti UNIFRA
Raul Ceretta Nunes UFSM

DOI: https://doi.org/10.5753/wscad.2006.18943

Resumo

Detectores de defeitos (FDs) não confíaveis são utilizados como bloco básico na especificação e implementação de tolerância a falhas em sistemas distribuídos assíncronos. Um exemplo típico de sistemas distribuídos assíncronos e de larga escala é a Internet. Neste contexto, FDs tradicionais apresentam problemas, uma vez que seu projeto destina-se a redes controladas (LAN). Um problema a ser tratado é a explosão de mensagens, pois em sistemas de larga escala, onde o número de processos e os atrasos são imprevisíveis o problema da explosão de mensagens pode comprometer o desempenho do serviço de detecção de defeitos e a escalabilidade da aplicação. Neste sentido, este artigo trata do problema da explosão de mensagens propondo uma abordagem genérica e prática que utiliza o reaproveitamento de mensagens para suprir mensagens de controle nos FDs.

Referências

M. Bertier, O. Marin, and P. Sens. Performance analysis of a hierarchical failure detector. In DSN, pages 635-644, 2003.

M. W. Burns, A. D. George, and B. A. Wallace. Simulative performance analysis of gossip failure detection for scalable distributed systems. Cluster Computing, 2(3):207-217, 1999.

T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225-267, jan 1996.

W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of failure detectors. IEEE Trans. Comput., 51(1):13-32, jan 2002.

P. Felber, X. Défago, R. Guerraoui, and P. Oser. Failure detectors as first class objects. In Proceedings of the International Symposium on Distributed Objects and Applications (DOA'99), pages 132-141, Washington, USA, Sept. 1999. IEEE Computer Society.

C. Fetzer, M. Raynal, and F. Tronel. An adaptive failure detection protocol. In PRDC '01: Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing, pages 146-153, Washington, DC, USA, dec 2001. IEEE Computer Society.

M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382, 1985.

F. C. Gartner. Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput. Surv., 31(1):1-26, 1999.

N. Hayashibara, A. Cherif, and T. Katayama. Failure detectors for large-scale distributed systems. In SRDS '02: Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems, page 404, Washington, USA, 2002. IEEE Computer Society.

M. Larrea, S. Arévalo, and A. Fernández. Efficient algorithms to implement unreliable failure detectors in partially synchronous systems. In Proceedings of the 13th International Symposium on Distributed Computing, pages 34-48, London, UK, 1999.

M. Larrea, A. Fernández, and S. Arévalo. Optimal implementation of the weakest failure detector for solving consensus (brief announcement). In SRDS, pages 52-59, New York, NY, USA, 2000. ACM Press.

L. Peterson, A. Bavier, M. Fiuczynski, S. Muir, and T. Roscoe. Towards a Comprehensive PlanetLab Architecture. Technical report, PlanetLab Consortium, June 2005.

N. Sergent, X. Défago, and A. Schiper. Impact of a failure detection mechanism on the performance of consensus. In Proc. IEEE Pacific Rim Symp. on Dependable Computing (PRDC), Seoul, Korea, 2001.