Towards a graphical tool for modeling scientific workflows’ provenance according to the W3C PROV standard

  • Marcos Alves Vieira IF Goiano / UFG
  • Sergio T. Carvalho UFG


Provenance makes it possible to describe information about the steps involved in the production of a piece of data and allows an assessment of its quality, reliability, or credibility. When it comes to scientific workflows, provenance establishes the relationships between the artifacts associated with a given set of simulations and can be used to enable: (i) their sharing with the scientific community, (ii) the reproducibility of the results, or (iii) the evaluation of erroneous outputs. This paper presents the work in progress towards building a graphical provenance modeling tool conforming to the W3C PROV standard and following Model-Driven Engineering (MDE) concepts. The modeling tool can be used to model scientific workflows' provenance, enabling, for instance, their visual representation, reproducibility, and sharing.
Palavras-chave: Provenance, W3C PROV, scientific workflows, metamodel, MDE, EMF, GMF, modeling tool, Eclipse Sirius


Alves, R., Frota, Y., and de Oliveira, D. (2020). Gerência de dados de proveniência distribuídos de experimentos científicos: um mapeamento sistemático. In Anais do XIV Brazilian e-Science Workshop, pages 97–104, Porto Alegre, RS, Brasil. SBC.

Chiprianov, V., Kermarrec, Y., Rouvrais, S., and Simonin, J. (2014). Extending enterprise architecture modeling languages for domain specificity and collaboration. Software & Systems Modeling, 13(3):963–974.

Conquest, J. and Stiber, M. (2021). Software and Data Provenance as a Basis for eScience Workflow. In 2021 IEEE 17th International Conference on eScience.

Davidson, S. B. and Freire, J. (2018). Provenance and scientific workflows: Challenges and opportunities. In Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data, SIGMOD ’18, page 1345–1350, New York, NY, USA. Association for Computing Machinery.

Ferreira Filho, J. B. (2014). Leveraging model-based product lines for systems engineering. PhD thesis, Université Rennes 1, Paris, France.

Gil, Y., Miles, S., Belhajjame, K., Deus, H., Garijo, D., Klyne, G., Missier, P., Soiland-Reyes, S., and Zednik, S. (2013). PROV Model Primer. Available online:

Groth, P. and Moreau, L. (2013). PROV-Overview. An Overview of the PROV Family of Documents. Project report, World Wide Web Consortium.

Herschel, M., Diestelkämper, R., and Ben Lahmar, H. (2017). A survey on provenance: What for? What form? What from? The VLDB Journal, 26(6):881–906.

Jäger, S., Maschotta, R., Jungebloud, T., Wichmann, A., and Zimmermann, A. (2016). Creation of domain-specific languages for executable system models with the eclipse modeling project. In 2016 Annual IEEE Systems Conference (SysCon), pages 1–8.

Lebo, T., Sahoo, S., McGuinness, D., Belhajjame, K., Cheney, J., Corsar, D., Garijo, D., Soiland-Reyes, S., Zednik, S., and Zhao, J. (2013). PROV-O: The PROV Ontology. Available online:

Liang, X., Shetty, S., Tosh, D., Kamhoua, C., Kwiat, K., and Njilla, L. (2017). ProvChain: A Blockchain-Based Data Provenance Architecture in Cloud Environment with Enhanced Privacy and Availability. In 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pages 468–477. IEEE.

Moreau, L. and Groth, P. (2013). Provenance – An Introduction to PROV. Synthesis Lectures on the Semantic Web: Theory and Technology. Morgan and Claypool Life Sciences, San Rafael, CA.

Moreau, L., Missier, P., Belhajjame, K., B’Far, R., Cheney, J., Coppens, S., Cresswell, S., Gil, Y., Groth, P., Lebo, G. K. T., McCusker, J., Miles, S., Myers, J., and Sahoo, S. (2013). PROV-DM: The PROV Data Model. Available online:

Pignotti, E., Polhill, G., and Edwards, P. (2013). Using provenance to analyse agent-based simulations. In Proceedings of the Joint EDBT/ICDT 2013 Workshops, EDBT ’13, page 319–322, New York, NY, USA. Association for Computing Machinery.

Schmidt, D. C. (2006). Guest editor’s introduction: Model-driven engineering. Computer, 39(2):0025–31.

Sembay, M. J., de Macedo, D. D. J., and Lima Dutra, M. (2020). A method for collecting provenance data: A case study in a brazilian hemotherapy center. In Mugnaini, R., editor, Data and Information in Online Environments, pages 89–102, Cham. Springer.

Silva, C. T., Anderson, E., Santos, E., and Freire, J. (2010). Using VisTrails and provenance for teaching scientific visualization. Computer Graphics Forum, 30(1):75–84.

Steinberg, D., Budinsky, F., Merks, E., and Paternostro, M. (2008). EMF: Eclipse Modeling Framework. Pearson Education.

Suh, Y.-K. and Ma, J. (2017). Superman: A novel system for storing and retrieving scientific-simulation provenance for efficient job executions on computing clusters. In 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W), pages 283–288.

Vieira, M. A. and Carvalho, S. T. (2020). Building Models for Ubiquitous Application Development in a Model-Driven Engineering Approach, pages 115–147. Springer International Publishing, Cham.

Völter, M., Stahl, T., Bettin, J., Haase, A., and Helsen, S. (2013). Model-driven software development: technology, engineering, management. John Wiley & Sons.
VIEIRA, Marcos Alves; CARVALHO, Sergio T.. Towards a graphical tool for modeling scientific workflows’ provenance according to the W3C PROV standard. In: BRAZILIAN E-SCIENCE WORKSHOP (BRESCI), 16. , 2022, Niterói. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 97-104. ISSN 2763-8774. DOI: