Polyflow: A SOA for Analyzing Workflow Heterogeneous Provenance Data in Distributed Environments

  • Yan Mendes Universidade Federal de Juiz de Fora (UFJF)
  • Regina Braga Universidade Federal de Juiz de Fora (UFJF)
  • Victor Ströele Universidade Federal de Juiz de Fora (UFJF)
  • Daniel de Oliveira Universidade Federal Fluminense (UFF)

Resumo


In the last decade the (big) data-driven science paradigm became a wide-spread reality. However, this approach has some limitations such as a performance dependency on the quality of the data and the lack of reproducibility of the results. In order to enable this reproducibility, many tools such as Workflow Management Systems were developed to formalize process pipelines and capture execution traces. However, interoperating data generated by these solutions became a problem, since most systems adopted proprietary data models. To support interoperability across heterogeneous provenance data, we propose a Service Oriented Architecture with a polystore storage design in which provenance is conceptually represented utilizing the ProvONE model. A wrapper layer is responsible for transforming data described by heterogeneous formats into ProvONE-compliant. Moreover, we propose a query layer that provides location and access transparency to users. Furthermore, we conduct two feasibility studies, showcasing real usecase scenarios. Firstly, we illustrate how two research groups can compare their processes and results. Secondly, we show how our architecture can be used as a queriable provenance repository. We show Polyflow's viability for both scenarios using the Goal-Question-Metric methodology. Finally, we show our solution usability and extensibility appeal by comparing it to similar approaches.
Palavras-chave: Workflows interoperability, heterogeneous provenance data integration, polystore
Publicado
20/05/2019
Como Citar

Selecione um Formato
MENDES, Yan; BRAGA, Regina; STRÖELE, Victor; DE OLIVEIRA, Daniel. Polyflow: A SOA for Analyzing Workflow Heterogeneous Provenance Data in Distributed Environments. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 15. , 2019, Aracajú. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 383-390.