Self-Organization in the Selection of Migration Candidates in Bulk Synchronous Parallel Applications
Abstract
Process rescheduling can be a suitable technique for achieving high performance, mainly when irregular workloads and cluster-of-clusters environments are considered. The decision to move processes to new resources is NP-Hard, and heuristics take place in order to reach good results inside an acceptable time interval. In this way, this paper presents AutoMig – a novel heuristic for BSP (Bulk Synchronous Parallel) applications that self-organizes the selection of candidates for migration on different clusters. Its differential approach consists in a prediction function (pf) that considers processes’ computation and communication data as well as their migrations costs. pf is applied over a list of schedules and AutoMig’s final step decides whether one of them outperforms the current mapping. The results emphasize gains up to 32% when testing a CPU-bound application in a simulated cluster-of-clusters environment.
References
Bhandarkar, M. A., Brunner, R., and Kale, L. V. (2000). Run-time support for adaptive load balancing. In IPDPS ’00: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing, pages 1152–1159, London, UK. Springer-Verlag.
Bonorden, O. (2007). Load balancing in the bulk-synchronous-parallel setting using process migrations. In Int. Parallel and Distrib. Processing Symp., 2007,pages 1–9.IEEE.
Bonorden, O., Gehweiler, J., and auf der Heide, F. M. (2005). Load balancing strategies in a web computing environment. In Proceeedings of International Conference on Parallel Processing and Applied Mathematics (PPAM), pages 839–846, Poznan, Poland.
Casanova, H., Legrand, A., and Quinson, M. (2008). Simgrid: A generic framework for large-scale distributed experiments. In Tenth International Conference on Computer Modeling and Simulation (uksim), pages 126–131, Los Alamitos, CA, USA. IEEE.
Chen, L., Wang, C.-L., and Lau, F. (2008). Process reassignment with reduced migration cost in grid load rebalancing. Parallel and Distrib. Processing, 2008, pages 1–13,IEEE.
da Rosa Righi, R., Pilla, L. L., Carissimi, A., Navaux, P. A., and Heiss, H.-U. (2010). Observing the impact of multiple metrics and runtime adaptations on bsp process rescheduling. Parallel Processing Letters, 20(2):123–144.
da Silva e Silva, F. J., Kon, F., Goldman, A., Finger, M., de Camargo, R. Y., Filho, F. C., and Costa, F. M. (2010). Application execution management on the integrade opportunistic grid middleware. J. Parallel Distrib. Comput., 70(5):573–583.
De Grande, R. E. and Boukerche, A. (2011). Dynamic balancing of communication and computation load for hla-based simulations on large-scale distributed systems. J. Parallel Distrib. Comput., 71:40–52.
Ding, D., Luo, S., and Gao, Z. (2009). A dual heuristic scheduling strategy based on task partition in grid environments. In Int. Joint Conference on Computational Sciences and Optimization, pages 63–67, IEEE.
Duselis, J., Cauich, E., Wang, R., and Scherson, I. (2009). Resource selection and allocation for dynamic adaptive computing in heterogeneous clusters. In Cluster Computing and Workshops, 2009. CLUSTER ’09. IEEE International Conference on, pages 1 –9.
El Kabbany, G., Wanas, N., Hegazi, N., and Shaheen, S. (2011). A dynamic load balancing framework for real-time applications in message passing systems. International Journal of Parallel Programming, 39:143–182. DOI: 10.1007/s10766-010-0134-5.
Guo, Y., Chen, X., Deng, M., Wang, Z., Lv, W., Xu, C., and Wang, T. (2009). The fractal compression coding in mobile video monitoring system. In WRI International Conference on Comm. and Mobile Computing, pages 492–495, IEEE.
Hendrickson, B. (2009). Computational science: Emerging opportunities and challenges. Journal of Physics: Conference Series, 180(1):012013.
Huang, C., Zheng, G., Kale, L., and Kumar, S. (2006). Performance evaluation of adaptive mpi. In PPoPP ’06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 12–21, ACM Press.
Kwok, Y.-K. and Cheung, L.-S. (2004). A new fuzzy-decision based load balancing system for distributed object computing. J. Parallel Distrib. Comput., 64(2):238–253.
Moreno-Vozmediano, R. and Alonso-Conde, A. B. (2005). Influence of grid economic factors on scheduling and migration. In High Perf. Comp. for Computational Science, volume 3402 of Lecture Notes in Comp. Science, pages 274–287. Springer.
Nascimento, A. P., Sena, A. C., Boeres, C., and Rebello, V. E. F. (2007). Distributed and dynamic self-scheduling of parallel mpi grid applications: Research articles. Concurr. Comput. : Pract. Exper., 19(14):1955–1974.
Pascual, J. A., Navaridas, J., and Miguel-Alonso, J. (2009). Job scheduling strategies for parallel processing. chapter Effects of Topology-Aware Allocation Policies on Scheduling Performance, pages 138–156. Springer-Verlag, Berlin, Heidelberg.
Qin, X., Jiang, H., Manzanares, A., Ruan, X., and Yin, S. (2010). Communication-aware load balancing for parallel applic. on clusters. IEEE Trans. Comput., 59(1):42–52.
Sanjay, H. A. and Vadhiyar, S. S. (2009). A strategy for scheduling tightly coupled parallel applications on clusters. Concurr. Comput. : Pract. Exper., 21(18):2491–2517.
Silva, R. E., Pezzi, G., Maillard, N., and Diverio, T. (2005). Automatic data-flow graph generation of mpi programs. In SBAC-PAD ’05: Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing, pages 93–100, Washington, DC, USA. IEEE Computer Society.
Tanenbaum, A. (2003). Computer Networks. Prentice Hall, New Jersey, 4th ed.
Vadhiyar, S. S. and Dongarra, J. J. (2005). Self adaptivity in grid computing: Research articles. Concurr. Comput. : Pract. Exper., 17(2-4):235–257.
Xhafa, F. and Abraham, A. (2010). Computational models and heuristic methods for grid scheduling problems. Future Gener. Comput. Syst., 26(4):608–621.
Xing, C. (2008). An adaptive domain pool scheme for fractal image compression. Education Technology and Training and Geoscience and Remote Sensing, 2:719–722.
Yang, C.-T. and Chou, K.-Y. (2009). An adaptive job allocation strategy for heterogeneous multiple clusters. In Int. Conf. on Computer and Information Technology - Volume 02, pages 209–214, IEEE.
Zaki, M. J., Li, W., and Parthasarathy, S. (1997). Customized dynamic load balancing for a network of workstations. J. Parallel Distrib. Comput., 43(2):156–162.
