SimiFlow: An Architecture for Grouping Workflows by Similarity

  • Vítor Silva UFRJ
  • Fernando Chirigati UFRJ
  • Kely Maia UFRJ
  • Eduardo Ogasawara UFRJ
  • Daniel de Oliveira UFRJ
  • Vanessa Braganholo UFRJ
  • Leonardo Murta UFF
  • Marta Mattoso UFRJ

Abstract


Scientists have being using scientific workflows to support scientific experiments. However, the Scientific Workflow Management Systems present some limitation on workflow composition. Experiment Lines, which are a novel approach to deal with these limitations, allow the representation and systematic composition of the experiment. Nevertheless, there are many scientific workflows already modeled that can leverage the construction of experiment lines via the identification clusters of scientific workflows grouped according similarity. This paper propose SimiFlow, an architecture for comparison and clustering based on similarity to build experiment lines following a bottom-up approach.

References

Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S., (2004), "Kepler: an extensible system for design and execution of scientific workflows". In: 16th SSDBM, p. 423-424, Santorini, Greece.

Bunke, H., Shearer, K., (1998), "A graph distance metric based on the maximal common subgraph", Pattern Recogn. Lett., v. 19, n. 3-4, p. 255-259.

Callahan, S. P., Freire, J., Santos, E., Scheidegger, C. E., Silva, C. T., Vo, H. T., (2006), "VisTrails: visualization meets data management". In: Proceedings of the 2006 ACM SIGMOD, p. 745-747, Chicago, IL, USA.

Cavalcanti, M. C., Targino, R., Baião, F., Rössle, S. C., Bisch, P. M., Pires, P. F., Campos, M. L. M., Mattoso, M., (2005), "Managing structural genomic workflows using web services", Data & Knowledge Engineering, v. 53, n. 1, p. 45-74.

Deelman, E., Gannon, D., Shields, M., Taylor, I., (2009), "Workflows and e-Science: An overview of workflow system features and capabilities", Future Generation Computer Systems, v. 25, n. 5, p. 528-540.

GExp, (2009), Brazilian project for supporting large scale management of scientific experiments, [link].

Goble, C. A., Roure, D. C. D., (2007), "myExperiment: social networking for workflow-using e-scientists". In: Proceedings of the 2nd workshop on Workflows in support of large-scale science, p. 1-2, Monterey, California, USA.

Jain, A. K., Murty, M. N., Flynn, P. J., (1999), "Data clustering: a review", ACM Comput. Surv., v. 31, n. 3, p. 264-323.

Larman, C., Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development. 3 ed. Prentice Hall PTR.

Mattoso, M., Werner, C., Travassos, G. H., Braganholo, V., Murta, L., Ogasawara, E., Oliveira, D., Cruz, S. M. S. D., Martinho, W., (2010), "Towards Supporting the Life Cycle of Large Scale Scientific Experiments", To be published in Int. J. Business Process Integration and Management, n. Special Issue on Scientific Workflows

Ogasawara, E., Paulino, C., Murta, L., Werner, C., Mattoso, M., (2009), "Experiment Line: Software Reuse in Scientific Workflows". In: 21th SSDBM, p. 264–272

Ohst, D., Welle, M., Kelter, U., (2003), "Differences between versions of UML diagrams". In: Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering, p. 227-236, Helsinki, Finland.

Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M. R., et al., (2004), Taverna: a tool for the composition and enactment of bioinformatics workflows. Oxford Univ Press.

Santos, E., Lins, L., Ahrens, J. P., Freire, J., Silva, C. T., (2008), "A First Study on Clustering Collections of Workflow Graphs" Springer-Verlag, p. 160-173.

Seo, J., Seno, S., Takenaka, Y., Matsuda, H., (2007), "Retrieving Functionally Similar Bioinformatics Workflows Using TF-IDF Filtering", IPSJ Digital Courier, v. 3, p. 164– 173.

SiDiff, (2010), SiDiff, [link].

Uhrig, S., (2008), "Matching class diagrams: with estimated costs towards the exact solution?". In: Proceedings of the 2008 international workshop on Comparison and versioning of software models, p. 7-12, Leipzig, Germany.
Published
2010-07-20
SILVA, Vítor; CHIRIGATI, Fernando; MAIA, Kely; OGASAWARA, Eduardo; OLIVEIRA, Daniel de; BRAGANHOLO, Vanessa; MURTA, Leonardo; MATTOSO, Marta. SimiFlow: An Architecture for Grouping Workflows by Similarity. In: BRAZILIAN E-SCIENCE WORKSHOP (BRESCI), 4. , 2010, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2010 . p. 193-200. ISSN 2763-8774.