AkôFlow: A Middleware for Executing Scientific Workflows in Multiple Containerized Environments
Abstract
Various workflows produce a large volume of data and require parallelism techniques and distributed environments to reduce execution time. Workflow Systems run these workflows, which support efficient execution but focus on specific environments. Container technology has emerged as a solution for applications to run in heterogeneous environments through OS virtualization. Although there are container management and orchestration solutions, e.g., Kubernetes, they do not focus on scientific workflows. In this paper, we propose AkôFlow, a middleware for the parallel execution of scientific workflows in containerized environments. AkôFlow allows scientists to explore the parallel execution of activities with support for provenance capture. We evaluated AkôFlow with an astronomy workflow, and the results were promising.
References
Burkat, K., Pawlik, M., Balis, B., Malawski, M., Vahi, K., Rynge, M., da Silva, R. F., and Deelman, E. (2021). Serverless containers – rising viable approach to scientific workflows. In eScience, pages 40–49.
Carrión, C. (2023). Kubernetes scheduling: Taxonomy, ongoing issues and challenges. ACM Comput. Surv., 55(7):138:1–138:37.
de Oliveira, D., Ocaña, K. A. C. S., Baião, F. A., and Mattoso, M. (2012). A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput., 10(3):521–552.
de Oliveira, D., Ogasawara, E. S., Baião, F. A., and Mattoso, M. (2010). Scicumulus: A lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In CLOUD’10, pages 378–385.
de Oliveira, D., Silva, V., and Mattoso, M. (2015). How much domain data should be in provenance databases? In 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15).
de Oliveira, D. C. M., Liu, J., and Pacitti, E. (2019). Data-Intensive Workflow Management: For Clouds and Data-Intensive and Scalable Computing Environments. Synthesis Lectures on Data Management. Morgan & Claypool Publishers.
Deelman, E., da Silva, R. F., Vahi, K., Rynge, M., Mayani, R., Tanaka, R., Whitcup, W. R., and Livny, M. (2021). The pegasus workflow management system: Translational computer science in practice. J. Comput. Sci., 52:101200.
Freire, J., Koop, D., Santos, E., and Silva, C. T. (2008). Provenance for computational tasks: A survey. Computing in science & engineering, 10(3):11–21.
Guedes, T., Martins, L. B., Falci, M. L. F., Silva, V., Ocaña, K. A., Mattoso, M., Bedo, M., and de Oliveira, D. (2020). Capturing and analyzing provenance from spark-based scientific workflows with samba-rap. Future Generation Computer Systems, 112:658 – 669.
Jiang, Q., Lee, Y. C., and Zomaya, A. Y. (2017). Serverless execution of scientific workflows. In ICSOC 2017, pages 706–721. Springer.
Kunstmann, L., Pina, D., Oliveira, L., Oliveira, D., and Mattoso, M. (2022). Provdeploy: Explorando alternativas de conteinerização com proveniência para aplicações científicas com pad. In Anais do XXIII Simpósio em Sistemas Computacionais de Alto Desempenho, pages 49–60, Porto Alegre, RS, Brasil. SBC.
Kurtzer, G. M., Sochat, V., and Bauer, M. W. (2017). Singularity: Scientific containers for mobility of compute. PloS one, 12(5):e0177459.
Ogasawara, E. S., de Oliveira, D., Valduriez, P., Dias, J., Porto, F., and Mattoso, M. (2011). An algebraic approach for data-centric scientific workflows. Proc. VLDB Endow., 4(12):1328–1339.
Ogasawara, E. S., Dias, J., Silva, V., Chirigati, F. S., de Oliveira, D., Porto, F., Valduriez, P., and Mattoso, M. (2013). Chiron: a parallel engine for algebraic scientific workflows. Concurr. Comput. Pract. Exp., 25(16):2327–2341.
Sakellariou, R. et al. (2009). Mapping workflows on grid resources: Experiments with the montage workflow. In ERCIM W. Group on Grids, pages 119–132.
Shah, S. T., Lahaye, R. J. W. E., Kazmi, S. A. A., Chung, M. Y., and Hasan, S. F. (2014). Htcondor system for running extensive simulations related to D2D communication. In ICTC, pages 283–284. IEEE.
Silva, V., de Oliveira, D., Valduriez, P., and Mattoso, M. (2018). Dfanalyzer: runtime dataflow analysis of scientific applications using provenance. Proceedings of the VLDB Endowment, 11(12):2082–2085.
Struhár, V., Behnam, M., Ashjaei, M., and Papadopoulos, A. V. (2020). Real-time containers: A survey. In Fog-IoT, volume 80 of OASIcs, pages 7:1–7:9.
Teylo, L., de Paula Junior, U., et al. (2017). A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds. FGCS, 76:1–17.
Zheng, C., Tovar, B., and Thain, D. (2017). Deploying high throughput scientific workflows on container schedulers with makeflow and mesos. In CCGrid, CCGrid ’17, page 130–139. IEEE Press.
