Performance Evaluation of Checkpoint and Rollback-Recovery Algorithms for Distributed Systems

  • Sérgio Luis Cechin UFRGS
  • Ingrid Jansch-Pôrto UFRGS

Resumo


In distributed systems, backward recovery has the synchronous and asynchronous approaches as the two main implementation paradigms. In this paper we compare two representative algorithms on these groups and present some theoretical results. Koo & Toueg synchronous algorithm and Juang & Venkatesan asynchronous algorithm have been chosen for this purpose. Our goal is to demonstrate that the advantages and disadvantages between them are mainly related to the characteristics of the applications.

Palavras-chave: Fault Tolerance, Distributed Systems, Rollback Recovery, Synchronous and Asynchronous Checkpointing, Performance Evaluation

Referências

Cechin, S. L. On the theoretical performance evaluation of two rollback-recovery synchronous and asynchronous algorithms. CPGCC / UFRGS. April, 1998 (Report TI nº 729) - In Portuguese.

Jalote, P. Fault Tolerance in Distributed Systems. New Jersey: Prentice-Hall, 1994.

Juang, T.; Venkatesan, S. Crash Recovery with Little Overhead. Int'l. Conf. on Distributed Computing Systems. Proceedings. May 1991. pp.454-461.

Koo, R; Toueg. S. Checkpointing and Rollback-Recovery for Distributed Systems. IEEE Trans. on Software Engineering, v.SE-13(1):23-31, Jan. 1987.

Singhal, M.; Shivaratri, N. Advanced Concepts in Operating Systems. New York: McGraw-Hill, 1994.
Publicado
28/09/1998
CECHIN, Sérgio Luis; JANSCH-PÔRTO, Ingrid. Performance Evaluation of Checkpoint and Rollback-Recovery Algorithms for Distributed Systems. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 10. , 1998, Búzios/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 1998 . p. 137-146. DOI: https://doi.org/10.5753/sbac-pad.1998.22668.