Escalonamento Tolerante a Falhas para Clusters Multicores

  • Brevik Ferreira da Silva UFRN
  • Wellison Moura dos Santos UFRN
  • Idalmis Milián Sardiña UFRN
  • Livia de Mesquita Teixeira UFRN
  • Felipe de Albuquerque UFRN

Abstract


Large-scale parallel applications running with increased performance and fault-free is a challenge of the high performance computing. However, for best results is important the efficient use of available resources, exploring for example the shared and distributed memories of the new multicores architectures. This paper proposes a hybrid approach for fault-tolerant scheduling on clusters based on multicores processors. A case study is proposed for a parallel application modeled by a Directed Acyclic Graph (GAD) using the hybrid programming OpenMP and MPI. The proposal should discuss the advantages that this model can bring to these architectures, compared with the previous approach.

References

(May, 2008). OpenMP application program interface version 3.0 complete specifications. http://www.openmp.org/mp-documents/specs30.pdf.

(Nov, 2009). MPI: A message-passing interface standard version 2.1. https://www.mpi-forum.org/docs/mpi21-report.pdf.

Al-Omari, R., Somani, A. K., and Manimaran, G. (2005). An adaptive scheme for fault-tolerant scheduling of soft real-time tasks in multiprocessor systems. J. Parallel Distrib. Comput., 65(5):595–608.

Anne Benoit, M. H. and Robert, Y. (April 14-18, 2008). Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In Proceedings of the 22th ACM/IEEE International Parallel Distributed Processing Symposium IPDPS’08 - APDCM’08 IEEE Computer Society Press, Miami, Florida, USA.

Aversa, R., Di Martino, B., Rak, M., Venticinque, S., and Villano, U. (2005). Performance prediction through simulation of a hybrid MPI/OpenMP application. Parallel Comput.

Benoit, A., Hakem, M., and Robert, Y. (2008). Realistic models and efficient algorithms for fault tolerant scheduling on heterogeneous platforms. In Parallel Processing, 2008. ICPP ’08. 37th International Conference on, pages 8–12, Portland, Oregon, USA.

Boeres, C. and Rebello, V. E. F. (2004). Easygrid: towards a framework for the automatic grid enabling of legacy MPI applications: Research articles. Concurrency And Computation : Practice And Experience, 16(5):425–432.

Chorley, M. J., Walker, D. W., and Guest, M. F. (2009). Hybrid message-passing and shared-memory programming in a molecular dynamics application on multicore clusters. In Int. J. High Perform. Comput., pages 196–211.

da Silva, J. A. (2010). Tolerância a Falhas para Aplicações Autônomas em Grades Computacionais. PhD thesis, Instituto de Computação, Universidade Federal Fluminense, Niterói, RJ, Brasil.

de P. Nascimento, A., da C. Sena, A., da Silva, J. A., de C. Vianna, D. Q., Boeres, C., and Rebello, V. E. F. (2005). Managing the execution of large scale MPI applications on computational grids. 17th. International Symposium on Computer Architecture and High Performance Computing.

Qin, X. and Jiang, H. (2006). A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems. Parallel Computing, 32(5):331–356.

Rabenseifner, R., Hager, G., and Jost, G. (2009a). Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core smp nodes. Proceedings of the Cray Users Group Conference 2009 (CUG 2009).

Rabenseifner, R., Hager, G., and Jost, G. (2009b). Hybrid MPI/OpenMP parallel programming on clusters of multi-core smp nodes.

Sardina, I. M. (2010). Escalonamento Estático de Tarefas Bi-objetivo e Tolerante a Falhas para Sistemas Distribuídos. PhD thesis, Instituto de Computação, Universidade Federal Fluminense, Niterói, RJ, Brasil.

Sardina, I. M., Boeres, C., and Drummond, L. M. A. (2011a). An efficient weighted bi-objective scheduling algorithm for heterogeneous systems. Parallel Computing, 37:349–364.

Sardina, I. M., Boeres, C., and Drummond, L. M. A. (2011b). Escalonamento estático bi-objetivo e tolerante a falhas para sistemas distribuídos. In Simposio Brasileiro de Redes de Computadores e Sistemas Distribuídos, Campo Grande.

Su, M. F., El-Kady, I., Bader, D. A., and Lin, S. (2004). A novel fdtd application featuring openmp-mpi hybrid parallelization. In Proceedings of the 2004 international Conference on Parallel Processing ICPP, pages 373–379, Washington, DC, USA. IEEE Computer Society.

Topcuouglu, H., Hariri, S., and Wu, M. (2002). Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions Parallel Distributed Systems, 13(3):260–274.
Published
2013-05-06
SILVA, Brevik Ferreira da; SANTOS, Wellison Moura dos; SARDIÑA, Idalmis Milián; TEIXEIRA, Livia de Mesquita; ALBUQUERQUE, Felipe de. Escalonamento Tolerante a Falhas para Clusters Multicores. In: FAULT TOLERANCE WORKSHOP (WTF), 14. , 2013, Brasília/DF. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2013 . p. 77-88. ISSN 2595-2684. DOI: https://doi.org/10.5753/wtf.2013.23017.