Uma Estratégia para Versionamento dos Dados de Workflows Científicos Executados em Nuvem

  • Fabrício Nogueira UFF
  • Kary Ocaña LNCC
  • Vítor Silva UFRJ
  • Vanessa Braganholo UFF
  • Daniel de Oliveira UFF

Resumo


Scientific experiments usually run hundreds or thousands of times, generating a huge amount of data that requires to be managed. Analizing and comparing the results of such experiments is na extremely complex task. This becomes even more complex for workflows running in the cloud because the data is scattered across multiple virtual machines. In order to alleviate this proble, previous work proposed the use of a version control system to manage the data consumed and generated by scientific experiments. However, they add considerable overhead to the experiment, increasing the processing time and the use of disk space. In this article, we propose an alternative strategy to reduce time and space. Our initial experiments show that the time overhead of our approach is still high, but disk overhead was 5 times smaller than the approaches in the literature.

Referências

Callahan, S.P., Freire, Santos, E., Scheidegger, C.E., Silva, C.T., Vo. H.T.,2006, VisTrails: visualization meets data management in, SIGMOD. pp. 745-747.

Costa, B., Ogasawara. E., Muria. L., Mattoso, M., 2009. Uma Estrategia de Versiona.mento de Workflows Científicos em Granularidade e-Science Workshop. pp. 49-56.

Deelma.n. E., Gannon. D., Shields. M., Taylor, I.,2009. Workflows and e-Science, All overview of workflow system features and capabilities. FGC,5 25,528-540.

Deeimam. E., Vahi. K., Juve. G., R.yuge. M., Callaghan. S., Maechling. P.J., Maya.ni. R.. Chen, W., Ferreira da Silva, R.. Livny, M., Wenger. K.. 2015. Pegasus. al.vorkflow management system for science automation. FGCS 46.17-35.

Koop. D., Santos. E., Bauer. B., Troyer, M., Freire. J.. Silva, C.T., 2010. Bridging Workflow and Data Provenance Using Strong Links in, SSDBM. pp. 397-415.

Malattoso. M., Werner, C., Travassos. G.1-1., Braganholo. V., Ogasawara, E., Oliveira. D.. Cruz. S. Martinho. W.. Muria. L., 2010. Towards supporting the life cycle of large scale scientific experiments. Int J, Bus, Process Integr. Manag. 5,79-92,

Neves. V., de Oliveira, D., Ocaila. K. Bra.ganholo. V., Muria. L... 2017. Managing Provenance of Implicit Data Flows in Scientific Experiments. ACM Trans. Internet Technol. to appear.

Neves, V.C., Braganholo, V,, Muria, L.,2013. Implicit Provenance Gathering through Configuration Management. in, SE-CSE. pp. 92-95.

Ocaña, K., Oliveira, D. de. Ogasawara. E., Davila. A., Lima. A., Mattoso. M., 2011. SciPhv, A Cloud-Based Workflow for Phylogenetic Analysis of Drug Targets in Protozoan Genomes, in, Advances in Bioinformatics and Computational Biology, Lecture ,Cotes in Computer Science. Springer. pp. 66-70.

Ogasawara, E., R.angel. P., Murta. L., Werner, C., Mattoso. M.,2009. Comparison and versioning of scientific workflows, in, CVSM, pp. 25-30.

Oliveira. D. de. Ocala, K.A.C.S., I'datroso. M.,2012. A Proven.ance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds. J. Grid Comput. 10.521-552.

Oliveira. W., Neves. V.C., Ocana, K.. Muria, L., Oliveira. D., Braganholo. V., 2014. Captura e Consulta a Dados de Proveniencia Retrospective Implicita. Intra-Atividade, in, SBBD. pp. 37-46.
Publicado
22/07/2017
NOGUEIRA, Fabrício; OCAÑA, Kary; SILVA, Vítor; BRAGANHOLO, Vanessa; DE OLIVEIRA, Daniel. Uma Estratégia para Versionamento dos Dados de Workflows Científicos Executados em Nuvem. In: BRAZILIAN E-SCIENCE WORKSHOP (BRESCI), 11. , 2017, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 37-44. ISSN 2763-8774. DOI: https://doi.org/10.5753/bresci.2017.9920.