Programando um Subsistema Síncrono para Suporte a Mecanismos Eficientes de Tolerância a Falhas

Andrey E. M. Brito; Francisco V. Brasileiro

doi:10.5753/wtf.2004.23377

Andrey E. M. Brito UFCG
Francisco V. Brasileiro UFCG

DOI: https://doi.org/10.5753/wtf.2004.23377

Resumo

Nesse trabalho nós apresentamos o projeto e a implementação de um subsistema síncrono (wormhole) a ser incorporado em um sistema assíncrono, a fim de viabilizar a solução de problemas distribuídos tolerantes a falhas, que de outra maneira não teriam solução determinística em um sistema puramente assíncrono. A arquitetura do wormhole incorpora serviços básicos, como sincronização de relógios e detecção de falhas de máquinas do sistema, como também oferece uma interface de programação que permite a instalação de serviços síncronos especializados. Um desses serviços é apresentado para ilustrar a utilização do wormhole por uma aplicação.

Referências

Bovet, D. and Cesati, M. (2003). Understanding the Linux Kernel. O’Reilly, 3 edition.

Brito, A. E. M. (2004). Uma arquitetura híbrida para o suporte de protocolos distribuídos tolerantes a falhas. Dissertação de mestrado, COPIN - Universidade Federal da Paraíba, Campina Grande.

Casimiro, A., Martins, P., and Veríssimo, P. (2000). How to build a timely computing base using real-time linux. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems, pages 127–1343, Porto, Portugal. IEEE Industrial Electronics Society.

Chandra, T. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267.

Charron-Bost, B., Guerraoui, R., and Schiper, A. (2000). Synchronous system and perfect failure detector: solvability and efficiency issues. In Proceedings of the IEEE Int. Conf. on Dependable Systems and Networks (DSN), pages 523–532, New York, USA. IEEE Computer Society.

Cristian, F. and Fetzer, C. (1999). The timed asynchronous distributed system model. IEEE Transactions on Parallel and Distributed Systems, 10(6):642–657.

Fetzer, C. (2003). Perfect failure detection in timed asynchronous systems. IEEE Transactions on Computers, 52(2):99–112.

Fischer, M. J., Lynch, N. A., and Paterson, M. D. (1985). Impossibility of distributed consensus with one faulty process. Journal of ACM, 32(2):374–382.

Larrea, M., Fernández, A., and Arévalo, S. (2001). On the impossibility of implementing perpetual failure detectors in partially synchronous systems. In Brief Announcements 15th Int’l Symp. Distributed Computing (DISC 2001).

Oliveira, E. W., Brito, A. E. M., and Brasileiro, F. V. (2003). Projeto e implementação de um serviço de detecção de falhas perfeito. In Simpósio Brasileiro de Redes de Computadores, pages 697–712, Natal/RN, Brasil.

Sabel, L. S. and Marzullo, K. (1995). Election vs. consensus in asynchronous systems. Technical Report TR95-1488, Cornell University.

Verissimo, P. and Almeida, C. (1995). Quasi-synchronism: a step away from the tradicional fault-tolerant real-time system models. Bulletin of the Technical Committee on Operating Systems and Application Environments (TCOS), 7(4):35–39.

Veríssimo, P. (2003). Uncertainty and predictability: Can they be reconciled? Future Directions in Distributed Computing, Springer Verlag LNCS 2584, pages 108–113.

Veríssimo, P. and Casimiro, A. (2002). The Timely Computing Base model and architecture. Transactions on Computers, 51(8):916–930.