Implementação de um Mecanismo de Recuperação por Retorno para o Ambiente de Computação OurGrid

  • Hélio Antônio Miranda da Silva UFRGS
  • Tórgan Flores de Siqueira UFRGS
  • Leonardo Rech Dalpiaz UFRGS
  • Ingrid Jansch-Pôrto UFRGS
  • Taisy Silva Weber UFRGS

Abstract


OurGrid is a middleware that supports the execution of applications on resources available in a grid. A central node performs the scheduling and control of applications, which makes it a critical element in the system in case of faults. Aiming to decrease the loss of computation caused by crash failures of the central node, we propose an implementation of a rollback recovery mechanism. The overhead of this mechanism on the OurGrid scheduler is assessed in scenarios with various characteristics.

References

Andrade, N.; Cirne, W.; Brasileiro, F. and Roisenberg, P. OurGrid: An approach to easily assemble grids with equitable resource sharing. Proceedings of 9th Workshop on Job Scheduling Strategies for Parallel Processing. June, 2003.

Balbinot, J. I.; Jansch-Pôrto, I.; Silva, H. A. M.; Weber, T. S. Avaliação de um Mecanismo de Checkpointing para o MyGrid. Anais do 6º WTF – Workshop de Testes e Tolerância a Falhas. Fortaleza – CE. 2005, pp. 39-50.

Birman, K. P., Building Secure and Reliable Network Applications. Publisher: Manning Publications. 5 th Ed. 1996.

Bosilca, G. et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes, Proceedings of ACM/IEEE Conference on Supercomputing, 2002. Baltimore, USA. Nov. 2002, pp. 1-18.

Cirne, W. Grids Computacionais: Arquiteturas, Tecnologias e Aplicações, Anais do 3º Workshop em Sistemas Computacionais de Alto Desempenho (WSCAD). Vitória, ES – Brasil. Out, 2002.

Cirne, W. et al. Grid Computing for Bag of Tasks Applications, Proceedings of 3 th IFIP Conference on E-Commerce, E-Business and E-Goverment. São Paulo, SP - Brasil. Sept. 2004.

Costa, L.B. et al. MyGrid: A complete solution for Running Bag-of-Tasks Applications, Anais do Simpósio Brasileiro de Redes de Computadores – III Salão de Ferramentas (SBRC) Gramado, RS – Brasil. Maio, 2004.

Foster, I.; Kesselman, C.; The Globus project: A status report. Proceedings of IPPS/SPDP Heterogeneous Computing Workshop. Orlando, Florida – USA: IEEE Computer Society Press, Apr. 1998, pp. 4-18.

Foster I.; Kesselman C. The grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers Inc. 2003.

Grimshaw A. S.; Wulf WM. A. and The Legion Team. The Legion Vision of a Worldwide Virtual Computer. Communications of the ACM, New York, NY - USA, v. 40. n. 1, pp. 39-45, Jan. 1997.

OurGrid. Online Manual. In: http://www.ourgrid.org/ Universidade Federal de Campina Grande, 2005.

Paranhos, D.; Cirne, W.; Brasileiro, F. Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids, Proceedings of International Conference on Parallel and Distributed Computing (Lecture Notes in Computer Science), (EURO-PAR 2003). June, 2003.

Plank, J. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Tecnical Report. Department of Computer Science University of Tennesse. Tennesse – USA, 1997.

Santos-Neto, E.; Cirne, W.; Brasilero, F.; Lima, A. Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids. Proceedings of the 10 th Workshop on Job Scheduling Strategies for Parallel Processing. 2004.
Published
2006-05-29
SILVA, Hélio Antônio Miranda da; SIQUEIRA, Tórgan Flores de; DALPIAZ, Leonardo Rech; JANSCH-PÔRTO, Ingrid; WEBER, Taisy Silva. Implementação de um Mecanismo de Recuperação por Retorno para o Ambiente de Computação OurGrid. In: FAULT TOLERANCE WORKSHOP (WTF), 7. , 2006, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2006 . p. 111-122. ISSN 2595-2684. DOI: https://doi.org/10.5753/wtf.2006.23356.