MAESTRO: An Approach for Composition and Analysis of Script-Based Workflows using Ontologies
Abstract
Specifying workflows that implement scientific experiments using scripts is challenging, mainly because multiple programs can implement each step of the experiment. Setting programs inadequately can cause inconsistencies due to incompatibility of formats, dependencies, etc. Furthermore, even if a script is well specified and adequately executed, analyzing the data produced without knowledge about the experiment’s domain terms and how it was specified can become challenging. In this paper, we present the MAESTRO approach, which is based on ontologies and provenance data to help the composition and analysis of the workflow implemented as a script. The MAESTRO approach combines Experiment Lines and domain data concepts and uses reasoners to specify a script and support analytical queries. MAESTRO was evaluated through a feasibility study in the bioinformatics domain, and the results were promising.
References
Baranowski, M. et al. (2012). Constructing workflows from script applications. Sci. Program., 20(4):359–377.
Barba, L. A. et al. (2021). Scientific computing with python on high-performance heterogeneous systems. Comput. Sci. Eng., 23(4):5–7.
Carvalho, L. A. M. C. et al. (2017). NiW: Converting Notebooks into Workflows to Capture Dataflow and Provenance. In Proc. of the K-CAP, pages 12–16.
Crist, J. (2016). Dask & Numba: Simple libraries for optimizing scientific python code. In IEEE BigData 2016, pages 2342–2343.
De Bruijn, J., Bussler, C., Domingue, J., Fensel, D., Hepp, M., Kifer, M., König-Ries, B., Kopecky, J., Lara, R., Oren, E., et al. (2005). Web service modeling ontology (wsmo). Interface, 5(1):50.
Deelman, E. et al. (2005). Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming, 13(3):219–237.
Dias, L., Lopes, B., and de Oliveira, D. (2020a). Ontoexpline: Rumo a uma ontologia para representação de linhas de experimento algébricas. In Anais do XIV Brazilian e-Science Workshop, pages 33–40, Porto Alegre, RS, Brasil. SBC.
Dias, L. G., Lopes, B., and de Oliveira, D. (2019). Aplicação de ontologias de proveniência em workflows científicos: um mapeamento sistemático. In Anais do XIII Brazilian e-Science Workshop, Porto Alegre, RS, Brasil. SBC.
Dias, L. G., Mattoso, M., Lopes, B., and de Oliveira, D. (2020b). Experiencing dfanalyzer for runtime analysis of phylogenomic dataflows. In Anais do XIII Simpósio Brasileiro de Bioinformática, pages 105–116, Porto Alegre, RS, Brasil. SBC.
Filgueira, R. et al. (2020). Dispel4py: An open-source python library for data-intensive seismology. In EGU General Assembly Conf., page 6790.
Freire, J., Koop, D., Santos, E., and Silva, C. T. (2008). Provenance for computational tasks: A survey. Comput. Sci. Eng., 10(3):11–21.
Gannon, D. et al. (2007). In Workflows for e-Science, Scientific Workflows for Grids, pages 1–8. Springer.
Gil, Y. (2013). Mapping semantic workflows to alternative workflow execution engines. In 2013 IEEE ICSC, pages 377–382. IEEE Computer Society.
Gil, Y. et al. (2007). On the black art of designing computational workflows. In Proc.s of the WORKS, page 53–62, New York, NY, USA.
Gil, Y., Ratnakar, V., and Fritz, C. (2010). Assisting scientists with complex data analysis tasks through semantic workflows. In AAAI Fall Symposium. AAAI.
Guarino, N. (1997). Understanding, building and using ontologies. Int. J. Human-Computer Studies, 46(2-3):293–310.
Lamprecht, A. et al. (2021). Perspectives on automated composition of workflows in the life sciences. F1000Research, 10:897.
Marinho, A. et al. (2017). Deriving scientific workflows from algebraic experiment lines: A practical approach. FGCS, 68:111–127.
Martin, D., Burstein, M., Hobbs, J., Lassila, O., McDermott, D., McIlraith, S., Narayanan, S., Paolucci, M., Parsia, B., Payne, T., et al. (2004). Owl-s: Semantic markup for web services. W3C member submission, 22(4).
Patil, A. A., Oundhakar, S. A., Sheth, A. P., and Verma, K. (2004). Meteor-s web service annotation framework. In Proceedings of the 13th international conference on World Wide Web, pages 553–562.
Ristov, S. et al. (2021). AFCL: An abstract function choreography language for serverless workflow specification. FGCS, 114:368–382.
Silva, V. et al. (2020). Dfanalyzer: Runtime dataflow analysis tool for computational science and engineering applications. SoftwareX, 12:100592.
Wang, G. and Peng, B. (2019). Script of scripts: A pragmatic workflow system for daily computational research. PLoS Comput. Biol., 15(2).
Weibel, S. L. and Koch, T. (2000). The dublin core metadata initiative. D-lib magazine, 6(12):1082–9873.
