Análise de Custo e Desempenho de um Sistema de Modelagem Atmosférica Tolerante a Falhas no AWS ParallelCluster
Resumo
Este trabalho teve como objetivo analisar o desempenho do modelo de previsão numérica do tempo BRAMS em execução em um cluster AWS criado com o AWS ParallelCluster em diferentes mercados de instâncias, comparandoo com a execução no supercomputador Santos Dumont. Foi proposta uma metodologia para executar uma versão tolerante a falhas do BRAMS no mercado de Spot, onde as instâncias podem ser revogadas, embora ofereçam custos mais baixos. Os tempos de execução na nuvem foram satisfatórios quando comparados ao Santos Dumont. Em geral, a solução Spot reduziu o custo financeiro quando comparado ao uso de instâncias regulares On-Demand. Apenas em um cenário com muitas revogações, o que consequentemente aumenta o tempo de execução e o custo, a opção de usar o mercado On-Demand foi mais adequada.
Referências
Amazon Web Service, I. (2023). Amazon Elastic Compute Cloud: Manual do usuário para instâncias do Linux.
Amazon Web Services, I. (2023a). AWS Fault Injection Simulator: User Guide.
Amazon Web Services, I. (2023b). AWS ParallelCluster: AWS ParallelCluster User Guide (v3).
Benacchio, T., Bonaventura, L., Altenbernd, M., Cantwell, C. D., Düben, P. D., Gillard, M., Giraud, L., Göddeke, D., Raffin, E., Teranishi, K., et al. (2021). Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction. The International Journal of High Performance Computing Applications, 35(4):285–311.
Bez, J. L., Carneiro, A. R., Pavan, P. J., Girelli, V. S., Boito, F. Z., Fagundes, B. A., Osthoff, C., da Silva Dias, P. L., Méhaut, J.-F., and Navaux, P. O. (2020). I/o performance of the santos dumont supercomputer. The International Journal of High Performance Computing Applications, 34(2):227–245.
Bourhnane, S. and Abid, M. R. (2020). High-performance computing as a cloud computing service. International Journal of Advanced Trends in Computer Science and Engineering.
Carreno, E. D., Roloff, E., and Navaux, P. O. (2015). Challenges and solutions in executing numerical weather prediction in a cloud infrastructure. Procedia Computer Science, 51:2832–2837.
Carreno, E. D., Roloff, E., and Navaux, P. O. (2016). Towards weather forecasting in the cloud. In 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pages 659–663. IEEE.
Castro, P., Ishakian, V., Muthusamy, V., and Slominski, A. (2017). Serverless programming (function as a service). In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2658–2659. IEEE.
de Araujo, L., Charão, A., Lima, J. V., and de Campos Velho, H. (2020). Análise de uma aplicaçao de modelagem atmosférica em nuvem e em contêineres utilizando rastros. In Anais Estendidos do XXI Simpósio em Sistemas Computacionais de Alto Desempenho, pages 54–61. SBC.
Fazenda, A. L., Rodrigues, E. R., Tomita, S. S., Panetta, J., and Mendes, C. L. (2012). Improving the scalability of an operational scientific application in a large multi-core cluster. In Computer Systems (WSCAD-SSC), 2012 13th Symposium on, pages 126–132. IEEE.
Freitas, S., Longo, K., Silva Dias, M., Chatfield, R., Silva Dias, P., Artaxo, P., Andreae, M., Grell, G., Rodrigues, L., Fazenda, A., et al. (2009). The coupled aerosol and tracer transport model to the brazilian developments on the regional atmospheric modeling system (catt-brams)–part 1: Model description and evaluation. Atmospheric Chemistry and Physics, 9(8):2843–2861.
Freitas, S. R., Panetta, J., Longo, K. M., Rodrigues, L. F., Moreira, D. S., Rosario, N. E., Silva Dias, P. L., Silva Dias, M. A., Souza, E. P., Freitas, E. D., et al. (2017). The brazilian developments on the regional atmospheric modeling system (brams 5.2): an integrated environmental model tuned for tropical areas. Geoscientific Model Development, 10(1):189–222.
Freitas, S. R., Rodrigues, L. F., Panetta, J., Longo, K., Moreira, D., Freitas, E., Longo, M., Fazenda, A., Fonseca, R., Stockler, R., and Camponogara, G. (2016). Description of the model input namelist parameters. CPTEC/INPE, São Paulo, Brasil.
Guedes, T., Jesus, L. A., Ocaña, K. A., Drummond, L. M., and de Oliveira, D. (2020). Provenance-based fault tolerance technique recommendation for cloud-based scientific workflows: a practical approach. Cluster Computing, 23:123–148.
He, Q., Zhou, S., Kobler, B., Duffy, D., and McGlynn, T. (2010). Case study for running hpc applications in public clouds. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 395–401.
Koop, M. and Raman, K. (2021). Numerical weather prediction on aws graviton2.
Michalakes, J. (2020). Hpc for weather forecasting. Parallel Algorithms in Computational Science and Engineering, pages 297–323.
Montes, D., Añel, J. A., Wallom, D. C., Uhe, P., Caderno, P. V., and Pena, T. F. (2020). Cloud computing for climate modelling: Evaluation, challenges and benefits. Computers, 9(2):52.
Netto, M. A., Calheiros, R. N., Rodrigues, E. R., Cunha, R. L., and Buyya, R. (2018). Hpc cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Computing Surveys (CSUR), 51(1):1–29.
Pielke, R. A., Cotton, W., Walko, R. e. a., Tremback, C. J., Lyons, W. A., Grasso, L., Nicholls, M., Moran, M., Wesley, D., Lee, T., et al. (1992). A comprehensive meteorological modeling system—rams. Meteorology and Atmospheric Physics, 49(1-4):69–91.
Powers, J. G., Werner, K. K., Gill, D. O., Lin, Y.-L., and Schumacher, R. S. (2021). Cloud computing efforts for the weather research and forecasting model. Bulletin of the American Meteorological Society, 102(6):E1261–E1274.
Sousa, W. P., Soares, F. M., Brum, R. C., Figueiredo, M., Melo, A. C., de Castro, M. C. S., and Bentes, C. (2023). Biological sequence comparison on cloud-based gpu environment. In High Performance Computing in Clouds: Moving HPC Applications to a Scalable and Cost-Effective Environment, pages 239–263. Springer.
Walko, R. L., Tremback, C. J., Panetta, J., Freitas, S., and Fazenda, A. L. (2002). RAMS - Regional Atmospheric Modeling System Version 5.0: Model input namelist parameters. CPTEC.
Xu, X., Mo, R., Dai, F., Lin, W., Wan, S., and Dou, W. (2019). Dynamic resource provisioning with fault tolerance for data-intensive meteorological workflows in cloud. IEEE Transactions on Industrial Informatics, 16(9):6172–6181.
Yoo, A. B., Jette, M. A., and Grondona, M. (2003). Slurm: Simple linux utility for resource management. In Workshop on job scheduling strategies for parallel processing, pages 44–60. Springer.