ProvDeploy: Explorando Alternativas de Conteinerização com Proveniência para Aplicações Científicas com PAD
Resumo
As aplicações científicas demandam ambientes de Processamento de Alto Desempenho (PAD). Essas aplicações possuem diversos componentes advindos de bibliotecas e diferentes ambientes, tornando a pilha de software a ser gerenciada no momento da implantação e execução nada trivial. Essa complexidade aumenta caso o usuário necessite acoplar serviços de captura de dados de proveniência à sua aplicação. Este artigo apresenta o ProvDeploy para auxiliar o usuário na configuração de contêineres para sua aplicação com captura de proveniência. O ProvDeploy foi avaliado com uma aplicação intensiva em dados da área de Bioinformática, explorando alternativas de conteinerização em dois ambientes de PAD.
Referências
Balis, B., Bronski, A., and Szarek, M. (2022). Auto-scaling of scientific workflows in kubernetes. In ICCS, pages 33-40. Springer.
Bechhofer, S., De Roure, D., Gamble, M., Goble, C., and Buchan, I. (2010). Research objects: Towards exchange and reuse of digital knowledge. Nature Proc., pages 1-6.
Chen, X., Irshad, H., Chen, Y., Gehani, A., et al. (2021). Clarion: Sound and clear provenance tracking for microservice deployments. In USENIX Security, pages 3989-4006.
Chirigati, F., Rampin, R., Shasha, D. E., and Freire, J. (2016). Reprozip: Computational reproducibility with ease. In SIGMOD, pages 2085-2088. ACM.
de Oliveira, D., Ocaña, K. A., Baião, F., and Mattoso, M. (2012). A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput., 10(3):521-552.
Elia, D., Fiore, S., and Aloisio, G. (2021). Towards HPC and big data analytics convergence: Design and experimental evaluation of a HPDA framework for escience at scale. IEEE Access, 9:73307-73326.
Freire, J., Koop, D., Santos, E., and Silva, C. T. (2008). Provenance for computational tasks: A survey. Computing in science & engineering, 10(3):11-21.
Guedes, T., Jesus, L. A., Ocaña, K. A., Drummond, L., and de Oliveira, D. (2020). Provenance-based fault tolerance technique recommendation for cloud-based scientific workflows: a practical approach. Cluster Comp., 23(1):123-148.
Harrell, S. L., Michael, S., and Maltzahn, C. (2022). Advancing adoption of reproducibility in HPC: A preface to the special section. IEEE Trans. Par. Dist. Syst., 33(9):2011-2013.
Hassan, W. U., Aguse, L., Aguse, N., Bates, A., and Moyer, T. (2018). Towards scalable cluster auditing through grammatical inference over provenance graphs. In Network and Distributed Systems Security Symposium, pages 1-15.
Liu, P. and Guitart, J. (2022). Performance characterization of containerization for HPC workloads on infiniband clusters: an empirical study. Clust. Comput., 25(2):847-868.
Malik, T., Yuan, Z., Essawy, B. T., Castronova, A. M., Gan, T., Tarboton, D. G., Goodall, J. L., Peckham, S. D., Choi, E., and Bhatt, A. (2018). Sciunits: Reusable research objects. In AGU Fall Meeting Abstracts, volume 2018, pages IN34B-10.
Mattoso, M., Dias, J., Ocana, K. A., Ogasawara, E., Costa, F., Horta, F., Silva, V., and De Oliveira, D. (2015). Dynamic steering of hpc scientific workflows: A survey. Future Generation Computer Systems, 46:100-113.
Moreau, L. and Groth, P. (2013). Provenance: an introduction to prov. Synthesis lectures on the semantic web: theory and technology, 3(4):1-129. Morgan & Claypool Publishers.
Ocaña, K. A., Silva, V., de Oliveira, D., and Mattoso, M. (2015). Data analytics in bioinformatics: Data science in practice for genomics analysis workflows. In IEEE e-Science, pages 322-331. IEEE.
Pasquier, T., Han, X., Goldstein, M., Moyer, T., Eyers, D., Seltzer, M., and Bacon, J. (2017). Practical whole-system provenance capture. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC '17, page 405-418, New York, NY, USA. ACM.
Silva, V., Campos, V., Guedes, T., Camata, J., de Oliveira, D., Coutinho, A. L., Valduriez, P., and Mattoso, M. (2020). Dfanalyzer: Runtime dataflow analysis tool for computational science and engineering applications. SoftwareX, 12:100592.
Struhár, V., Behnam, M., Ashjaei, M., and Papadopoulos, A. V. (2020). Real-time containers: A survey. In Fog-IoT, volume 80 of OASIcs, pages 7:1-7:9.
Williams, A. and Tosh, D. K. (2021). Scientific workflow provenance architecture for heterogeneous hpc environments. In 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pages 0921-0927.
Yuan, D. Y. and Wildish, T. (2020). Bioinformatics application with kubeflow for batch processing in clouds. In HPDC, pages 355-367. Springer.
Zheng, C. and Thain, D. (2015). Integrating containers into workflows: A case study using makeflow, work queue, and docker. In WVTDC, pages 31-38.
Zhou, N., Georgiou, Y., Pospieszny, M., Zhong, L., Zhou, H., Niethammer, C., Pejak, B., Marko, O., and Hoppe, D. (2021). Container orchestration on hpc systems through kubernetes. Journal of Cloud Computing, 10(1):1-14.