Uncertainty Management in Bioinformatics Workflow Provenance Databases
Abstract
Provenance databases play an essential role in scientific experiments. The models considered to represent such data assume that there is a certainty in all the provenance relations. However, several experiments are not deterministic, which makes their results to be associated with uncertainties. Analyze provenance data in the presence of such uncertainties is not trivial. In this paper, we address the management of non-deterministic provenance data by relying on an extractor component that stores both provenance data and its corresponding uncertainty values in a probabilistic database. Experiments show an acceptable overhead of 3% in the workflow runtime and 16% in the time spent to process a query.
Keywords:
Provenance Databases, Uncertainty Management
References
Ahola, V., Aittokallio, T., Vihinen, M. and Uusipaikka, E. (2008). Model-based prediction of sequence alignment quality. Bioinformatics (Oxford, England), v. 24, n. 19, p. 2165–2171.
Boulos, J., Dalvi, N., Mandhani, B., et al. (2005). MYSTIQ: A System for Finding More Answers by Using Probabilities. In Int. Conf. Management of Data (SIGMOD), pp. 891-893.
Chapman, A., Blaustein, B. and Elsaesser, C. (2010). Provenance-based Belief. In Workshop on the Theory and Practice of Provenance (TaPP). p. 11.
Costa, F., Silva, V., De Oliveira, D., et al. (2013). Capturing and Querying Workflow Runtime Provenance with PROV: A Practical Approach. In EDBT/ICDT Workshops, pp. 282-289.
De Oliveira, D., Silva, V. and Mattoso, M. (2015). How Much Domain Data Should Be in Provenance Databases? In Workshop on Theory and Practice of Provenance (TaPP).
Freire, J., Koop, D., Santos, E. and Silva, C. T. (2008). Provenance for Computational Tasks: A Survey. Computing in Science Engineering, v. 10, n. 3, p. 11–21.
Gonçalves, J. C. de A. R., Oliveira, D. De, Ocaña, K. A. C. S., Ogasawara, E. and Mattoso, M. (2012). Using Domain-Specific Data to Enhance Scientific Workflow Steering Queries. In International Provenance and Annotation Workshop (IPAW), pp. 152–167.
Huang, J., Antova, L., Koch, C. and Olteanu, D. (2009). MayBMS: A Probabilistic Database Management System. In Int. Conf. Management of Data (SIGMOD), pp. 1071-1071.
Idika, N., Varia, M. and Phan, H. (2013). The Probabilistic Provenance Graph. In IEEE Security and Privacy Workshops (SPW), pp 34-41.
Mattoso, M., Werner, C., Travassos, G. H., et al. (2010). Towards supporting the life cycle of large scale scientific experiments. Int. Journal of Business Process Integration and Management, v. 5, n. 1, p. 79.
Moreau, L., Clifford, B., Freire, J., et al. (2011). The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, v. 27, n. 6, p. 743–756.
Moreau, L. and Missier, P. (2013). The PROV Data Model and Abstract Syntax Notation. W3C Recommendation.
Ocaña, K. A. C. S., Oliveira, D. De, Ogasawara, E., et al. (2011). SciPhy: A Cloud-Based Workflow for Phylogenetic Analysis of Drug Targets in Protozoan Genomes. In Advances in Bioinformatics and Computational Biology, pp. 66-70.
Ogasawara, E., Dias, J., Oliveira, D., et al. (2011). An Algebraic Approach for Data-Centric Scientific Workflows. Proc. of the Int. Conf. on Very Large Data Bases (PVLDB), v. 4, n. 12, p. 1328–1339.
Re, C. and Suciu, D. (2007). Management of Data with Uncertainties. In Conference on Information and Knowledge Management (CIKM), pp. 3-8.
Simmhan, Y. L., Plale, B. and Gannon, D. (2008). Query capabilities of the Karma provenance framework. Concurrency and Computation: Practice and Experience, v. 20, n. 5, p. 441–451.
Boulos, J., Dalvi, N., Mandhani, B., et al. (2005). MYSTIQ: A System for Finding More Answers by Using Probabilities. In Int. Conf. Management of Data (SIGMOD), pp. 891-893.
Chapman, A., Blaustein, B. and Elsaesser, C. (2010). Provenance-based Belief. In Workshop on the Theory and Practice of Provenance (TaPP). p. 11.
Costa, F., Silva, V., De Oliveira, D., et al. (2013). Capturing and Querying Workflow Runtime Provenance with PROV: A Practical Approach. In EDBT/ICDT Workshops, pp. 282-289.
De Oliveira, D., Silva, V. and Mattoso, M. (2015). How Much Domain Data Should Be in Provenance Databases? In Workshop on Theory and Practice of Provenance (TaPP).
Freire, J., Koop, D., Santos, E. and Silva, C. T. (2008). Provenance for Computational Tasks: A Survey. Computing in Science Engineering, v. 10, n. 3, p. 11–21.
Gonçalves, J. C. de A. R., Oliveira, D. De, Ocaña, K. A. C. S., Ogasawara, E. and Mattoso, M. (2012). Using Domain-Specific Data to Enhance Scientific Workflow Steering Queries. In International Provenance and Annotation Workshop (IPAW), pp. 152–167.
Huang, J., Antova, L., Koch, C. and Olteanu, D. (2009). MayBMS: A Probabilistic Database Management System. In Int. Conf. Management of Data (SIGMOD), pp. 1071-1071.
Idika, N., Varia, M. and Phan, H. (2013). The Probabilistic Provenance Graph. In IEEE Security and Privacy Workshops (SPW), pp 34-41.
Mattoso, M., Werner, C., Travassos, G. H., et al. (2010). Towards supporting the life cycle of large scale scientific experiments. Int. Journal of Business Process Integration and Management, v. 5, n. 1, p. 79.
Moreau, L., Clifford, B., Freire, J., et al. (2011). The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, v. 27, n. 6, p. 743–756.
Moreau, L. and Missier, P. (2013). The PROV Data Model and Abstract Syntax Notation. W3C Recommendation.
Ocaña, K. A. C. S., Oliveira, D. De, Ogasawara, E., et al. (2011). SciPhy: A Cloud-Based Workflow for Phylogenetic Analysis of Drug Targets in Protozoan Genomes. In Advances in Bioinformatics and Computational Biology, pp. 66-70.
Ogasawara, E., Dias, J., Oliveira, D., et al. (2011). An Algebraic Approach for Data-Centric Scientific Workflows. Proc. of the Int. Conf. on Very Large Data Bases (PVLDB), v. 4, n. 12, p. 1328–1339.
Re, C. and Suciu, D. (2007). Management of Data with Uncertainties. In Conference on Information and Knowledge Management (CIKM), pp. 3-8.
Simmhan, Y. L., Plale, B. and Gannon, D. (2008). Query capabilities of the Karma provenance framework. Concurrency and Computation: Practice and Experience, v. 20, n. 5, p. 441–451.
Published
2016-10-04
How to Cite
TALLARIDA, Gustavo; OCAÑA, Kary; PAES, Aline; BRAGANHOLO, Vanessa; DE OLIVEIRA, Daniel.
Uncertainty Management in Bioinformatics Workflow Provenance Databases. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 31. , 2016, Salvador/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2016
.
p. 181-186.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2016.24325.
