Swirls: A Platform for Enabling Multicluster and Multicloud Execution of Parallel Programs
Resumo
Swirls is a general purpose application for interactive building, deploying, and execution of message-passing parallel programs that address multicluster and multicloud requirements. It is implemented on HPC Shelf, a cloud-based platform for providing HPC services. Swirls enables the communication between MPI programs written in C#, C, C++, and Python across one or more clusters, either on-premise or cloud-based ones. At the current implementation status, The users of Swirls may use clusters formed by virtual machines over Amazon Elastic Compute Cloud (EC2) and Google Cloud Platform (GCP).Referências
M. A. S. Netto, R. N. Calheiros, E. R. Rodrigues, R. L. F. Cunha, and R. Buyya, “HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges,” ACM Computing Surveys, vol. 51, no. 1, pp. 1–29, Jan. 2018. [Online]. Available: http://doi.acm.org/10.1145/3150224
M. Zahran, “Heterogeneous Computing: Here to Stay,” Communications of the ACM, vol. 60, no. 3, pp. 42–45, Feb. 2017. [Online]. Available: http://doi.acm.org/10.1145/3024918
F. H. de Carvalho Junior, J. C. Silva, and A. B. O. Dantas, “A Scientific Workflow Management System for Orchestration of Parallel Components in a Cloud of Large-Scale Parallel Processing Services,” Science of Computer Programming, vol. 173, pp. 95–127, Mar. 2019.
F. H. de Carvalho Junior, W. G. Al Alam, and A. B. O. Dantas, “Contextual Contracts for Component-Oriented Resource Abstraction in a Cloud of High Performance Computing Services,” Concurrency and Computation: Practice and Experience, vol. 33, no. 18, p. e6225. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.6225
J. Dongarra, S. W. Otto, M. Snir, and D. Walker, “A Message Passing Standard for MPP and Workstation,” Communications of ACM, vol. 39, no. 7, pp. 84–90, 1996.
I. Flouris, V. Manikaki, N. Giatrakos, A. Deligiannakis, M. Garofalakis, M. Mock, S. Bothe, I. Skarbovsky, F. Fournier, M. Stajcer, T. Krizan, J. Yom-Tov, and T. Curin, “Ferari: A prototype for complex event processing over streaming multi-cloud platforms,” in Proceedings of the 2016 International Conference on Management of Data, ser. SIGMOD ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 2093–2096. [Online]. Available: https://doi.org/10.1145/2882903.2899395
N. Ferry, F. Chauvel, H. Song, A. Rossini, M. Lushpenko, and A. Solberg, “Cloudmf: Model-driven management of multi-cloud applications,” ACM Trans. Internet Technol., vol. 18, no. 2, Jan. 2018. [Online]. Available: https://doi.org/10.1145/3125621
D. Wu, S. Sakr, L. Zhu, and H. Wu, “Towards big data analytics across multiple clusters,” in Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, ser. CCGrid IEEE Press, 2017, p. 218–227. [Online]. Available: https://doi.org/10.1109/CCGRID.2017.73 ’17.
K. Maheshwari, E.-S. Jung, J. Meng, V. Morozov, V. Vishwanath, and R. Kettimuthu, “Workflow performance improvement using model-based scheduling over multiple clusters and clouds,” Future Generation Computer Systems, vol. 54, pp. 206–218, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X15000795
B. Fakih and D. El Baz, “Heterogeneous computing and multi-clustering support via peer-to-peer hpc,” in 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 2018, pp. 292–296.
A. Mosa, T. Kiss, G. Pierantoni, J. DesLauriers, D. Kagialis, and G. Terstyanszky, “Towards a cloud native big data platform using micado,” in 2020 19th International Symposium on Parallel and Distributed Computing (ISPDC), 2020, pp. 118–125.
P. A. R. S. Costa, F. M. V. Ramos, and M. Correia, “Chrysaor: Fine-grained, fault-tolerant cloud-of-clouds mapreduce,” in Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, ser. CCGrid ’17. IEEE Press, 2017, p. 421–430. [Online]. Available: https://doi.org/10.1109/CCGRID.2017.89
P. A. R. S. Costa, X. Bai, F. M. V. Ramos, and M. Correia, “Medusa: An efficient cloud fault-tolerant mapreduce,” in 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2016, pp. 443–452.
F. H. de Carvalho Junior and C. A. Rezende, “A Case Study on Expressiveness and Performance of Component-Oriented Parallel Programming,” J. of Parallel and Distributed Computing, vol. 73, no. 5, pp. 557–569, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0743731512002882
L. Deng, “The MNIST Database of Handwritten Digit Images for Machine Learning Research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
A. Sergeev and M. Del Balso, “Horovod: Fast and Easy Distributed Deep Learning in TensorFlow,” arXiv preprint arXiv:1802.05799, 2018.
J. Dongarra, “Basic Linear Algebra Subprograms Technical Forum Standard I,” International Journal of High Performance Applications and Supercomputing, vol. 16, no. 2, pp. 115–199, feb 2002.
M. Zahran, “Heterogeneous Computing: Here to Stay,” Communications of the ACM, vol. 60, no. 3, pp. 42–45, Feb. 2017. [Online]. Available: http://doi.acm.org/10.1145/3024918
F. H. de Carvalho Junior, J. C. Silva, and A. B. O. Dantas, “A Scientific Workflow Management System for Orchestration of Parallel Components in a Cloud of Large-Scale Parallel Processing Services,” Science of Computer Programming, vol. 173, pp. 95–127, Mar. 2019.
F. H. de Carvalho Junior, W. G. Al Alam, and A. B. O. Dantas, “Contextual Contracts for Component-Oriented Resource Abstraction in a Cloud of High Performance Computing Services,” Concurrency and Computation: Practice and Experience, vol. 33, no. 18, p. e6225. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.6225
J. Dongarra, S. W. Otto, M. Snir, and D. Walker, “A Message Passing Standard for MPP and Workstation,” Communications of ACM, vol. 39, no. 7, pp. 84–90, 1996.
I. Flouris, V. Manikaki, N. Giatrakos, A. Deligiannakis, M. Garofalakis, M. Mock, S. Bothe, I. Skarbovsky, F. Fournier, M. Stajcer, T. Krizan, J. Yom-Tov, and T. Curin, “Ferari: A prototype for complex event processing over streaming multi-cloud platforms,” in Proceedings of the 2016 International Conference on Management of Data, ser. SIGMOD ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 2093–2096. [Online]. Available: https://doi.org/10.1145/2882903.2899395
N. Ferry, F. Chauvel, H. Song, A. Rossini, M. Lushpenko, and A. Solberg, “Cloudmf: Model-driven management of multi-cloud applications,” ACM Trans. Internet Technol., vol. 18, no. 2, Jan. 2018. [Online]. Available: https://doi.org/10.1145/3125621
D. Wu, S. Sakr, L. Zhu, and H. Wu, “Towards big data analytics across multiple clusters,” in Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, ser. CCGrid IEEE Press, 2017, p. 218–227. [Online]. Available: https://doi.org/10.1109/CCGRID.2017.73 ’17.
K. Maheshwari, E.-S. Jung, J. Meng, V. Morozov, V. Vishwanath, and R. Kettimuthu, “Workflow performance improvement using model-based scheduling over multiple clusters and clouds,” Future Generation Computer Systems, vol. 54, pp. 206–218, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X15000795
B. Fakih and D. El Baz, “Heterogeneous computing and multi-clustering support via peer-to-peer hpc,” in 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 2018, pp. 292–296.
A. Mosa, T. Kiss, G. Pierantoni, J. DesLauriers, D. Kagialis, and G. Terstyanszky, “Towards a cloud native big data platform using micado,” in 2020 19th International Symposium on Parallel and Distributed Computing (ISPDC), 2020, pp. 118–125.
P. A. R. S. Costa, F. M. V. Ramos, and M. Correia, “Chrysaor: Fine-grained, fault-tolerant cloud-of-clouds mapreduce,” in Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, ser. CCGrid ’17. IEEE Press, 2017, p. 421–430. [Online]. Available: https://doi.org/10.1109/CCGRID.2017.89
P. A. R. S. Costa, X. Bai, F. M. V. Ramos, and M. Correia, “Medusa: An efficient cloud fault-tolerant mapreduce,” in 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2016, pp. 443–452.
F. H. de Carvalho Junior and C. A. Rezende, “A Case Study on Expressiveness and Performance of Component-Oriented Parallel Programming,” J. of Parallel and Distributed Computing, vol. 73, no. 5, pp. 557–569, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0743731512002882
L. Deng, “The MNIST Database of Handwritten Digit Images for Machine Learning Research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
A. Sergeev and M. Del Balso, “Horovod: Fast and Easy Distributed Deep Learning in TensorFlow,” arXiv preprint arXiv:1802.05799, 2018.
J. Dongarra, “Basic Linear Algebra Subprograms Technical Forum Standard I,” International Journal of High Performance Applications and Supercomputing, vol. 16, no. 2, pp. 115–199, feb 2002.
Publicado
26/10/2021
Como Citar
CARVALHO JUNIOR, Francisco Heron de; DANTAS, Allberson Bruno de Oliveira; SALES, Claro Henrique Silva.
Swirls: A Platform for Enabling Multicluster and Multicloud Execution of Parallel Programs. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 22. , 2021, Belo Horizonte.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 168-179.
DOI: https://doi.org/10.5753/wscad.2021.18521.