Programando um Subsistema Síncrono para Suporte a Mecanismos Eficientes de Tolerância a Falhas
Abstract
In this work we propose the design and implementation of a wormhole - a synchronous subsystem - to be appended to an asynchronous system to allow the solution of fault-tolerant distributed problems that otherwise would have no deterministic solution in a pure asynchronous system. The wormhole architecture encompasses basic services such as clock synchronization and node level failure detection, as well as a programming interface that allows the deployment of specialized synchronous services. One of these services is presented to illustrate the use of the wormhole by an application.References
Bovet, D. and Cesati, M. (2003). Understanding the Linux Kernel. O’Reilly, 3 edition.
Brito, A. E. M. (2004). Uma arquitetura híbrida para o suporte de protocolos distribuídos tolerantes a falhas. Dissertação de mestrado, COPIN - Universidade Federal da Paraíba, Campina Grande.
Casimiro, A., Martins, P., and Veríssimo, P. (2000). How to build a timely computing base using real-time linux. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems, pages 127–1343, Porto, Portugal. IEEE Industrial Electronics Society.
Chandra, T. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267.
Charron-Bost, B., Guerraoui, R., and Schiper, A. (2000). Synchronous system and perfect failure detector: solvability and efficiency issues. In Proceedings of the IEEE Int. Conf. on Dependable Systems and Networks (DSN), pages 523–532, New York, USA. IEEE Computer Society.
Cristian, F. and Fetzer, C. (1999). The timed asynchronous distributed system model. IEEE Transactions on Parallel and Distributed Systems, 10(6):642–657.
Fetzer, C. (2003). Perfect failure detection in timed asynchronous systems. IEEE Transactions on Computers, 52(2):99–112.
Fischer, M. J., Lynch, N. A., and Paterson, M. D. (1985). Impossibility of distributed consensus with one faulty process. Journal of ACM, 32(2):374–382.
Larrea, M., Fernández, A., and Arévalo, S. (2001). On the impossibility of implementing perpetual failure detectors in partially synchronous systems. In Brief Announcements 15th Int’l Symp. Distributed Computing (DISC 2001).
Oliveira, E. W., Brito, A. E. M., and Brasileiro, F. V. (2003). Projeto e implementação de um serviço de detecção de falhas perfeito. In Simpósio Brasileiro de Redes de Computadores, pages 697–712, Natal/RN, Brasil.
Sabel, L. S. and Marzullo, K. (1995). Election vs. consensus in asynchronous systems. Technical Report TR95-1488, Cornell University.
Verissimo, P. and Almeida, C. (1995). Quasi-synchronism: a step away from the tradicional fault-tolerant real-time system models. Bulletin of the Technical Committee on Operating Systems and Application Environments (TCOS), 7(4):35–39.
Veríssimo, P. (2003). Uncertainty and predictability: Can they be reconciled? Future Directions in Distributed Computing, Springer Verlag LNCS 2584, pages 108–113.
Veríssimo, P. and Casimiro, A. (2002). The Timely Computing Base model and architecture. Transactions on Computers, 51(8):916–930.
Brito, A. E. M. (2004). Uma arquitetura híbrida para o suporte de protocolos distribuídos tolerantes a falhas. Dissertação de mestrado, COPIN - Universidade Federal da Paraíba, Campina Grande.
Casimiro, A., Martins, P., and Veríssimo, P. (2000). How to build a timely computing base using real-time linux. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems, pages 127–1343, Porto, Portugal. IEEE Industrial Electronics Society.
Chandra, T. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267.
Charron-Bost, B., Guerraoui, R., and Schiper, A. (2000). Synchronous system and perfect failure detector: solvability and efficiency issues. In Proceedings of the IEEE Int. Conf. on Dependable Systems and Networks (DSN), pages 523–532, New York, USA. IEEE Computer Society.
Cristian, F. and Fetzer, C. (1999). The timed asynchronous distributed system model. IEEE Transactions on Parallel and Distributed Systems, 10(6):642–657.
Fetzer, C. (2003). Perfect failure detection in timed asynchronous systems. IEEE Transactions on Computers, 52(2):99–112.
Fischer, M. J., Lynch, N. A., and Paterson, M. D. (1985). Impossibility of distributed consensus with one faulty process. Journal of ACM, 32(2):374–382.
Larrea, M., Fernández, A., and Arévalo, S. (2001). On the impossibility of implementing perpetual failure detectors in partially synchronous systems. In Brief Announcements 15th Int’l Symp. Distributed Computing (DISC 2001).
Oliveira, E. W., Brito, A. E. M., and Brasileiro, F. V. (2003). Projeto e implementação de um serviço de detecção de falhas perfeito. In Simpósio Brasileiro de Redes de Computadores, pages 697–712, Natal/RN, Brasil.
Sabel, L. S. and Marzullo, K. (1995). Election vs. consensus in asynchronous systems. Technical Report TR95-1488, Cornell University.
Verissimo, P. and Almeida, C. (1995). Quasi-synchronism: a step away from the tradicional fault-tolerant real-time system models. Bulletin of the Technical Committee on Operating Systems and Application Environments (TCOS), 7(4):35–39.
Veríssimo, P. (2003). Uncertainty and predictability: Can they be reconciled? Future Directions in Distributed Computing, Springer Verlag LNCS 2584, pages 108–113.
Veríssimo, P. and Casimiro, A. (2002). The Timely Computing Base model and architecture. Transactions on Computers, 51(8):916–930.
Published
2004-05-10
How to Cite
BRITO, Andrey E. M.; BRASILEIRO, Francisco V..
Programando um Subsistema Síncrono para Suporte a Mecanismos Eficientes de Tolerância a Falhas. In: FAULT TOLERANCE WORKSHOP (WTF), 5. , 2004, Gramado/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2004
.
p. 25-36.
ISSN 2595-2684.
DOI: https://doi.org/10.5753/wtf.2004.23377.
