P+RProv: Prospective+Retrospective Provenance Graphs of Python Scripts

Authors

  • Vitor Gama Lemos Universidade Federal Fluminense
  • João Felipe Pimentel Universidade Federal Fluminense
  • Bruno Erbisti Universidade Federal Fluminense
  • Vanessa Braganholo Universidade Federal Fluminense

DOI:

https://doi.org/10.5753/jidm.2022.2059

Keywords:

diagrams, prospective provenance, provenance visualization, scripts

Abstract

The evolution of technology has enabled scientists to advance the automation of scientific experiments. Many programming languages have become popular in the scientific environment, especially scripting languages, due to their high abstraction level and simplicity, allowing the specification of complex tasks in fewer steps than traditional programming languages. Due to these features, lots of scientists model their scientific experiments in scripting languages to ensure data management and results control. However, this type of experiment usually generates large volumes of data, making data analysis and threat mitigation difficult. To fill in this gap, we propose P+RProv, an approach to aid scientists in understanding the structure of Python scripts and their results.

Downloads

Download data is not yet available.

References

Davison, A. P. Automated Capture of Experiment Context for Easier Reproducibility in Computational Research. Computing in Science & Engineering 14 (4): 48–56, 2012.

Ellson, J., Gansner, E. R., Koutsofios, E., North, S. C., and Woodhull, G. Graphviz and Dynagraph – Static and Dynamic Graph Drawing Tools. In Graph Drawing Software. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 127–148, 2004.

Finney, D. The Fisher-Yates test of significance in 2X2 contingency tables. Biometrika 35 (1/2): 145–156, 1948.

Freire, J., Koop, D., Santos, E., and Silva, C. T. Provenance for computational tasks: A survey. Computing in Science & Engineering 10 (3): 11–21, 2008.

Herschel, M., Diestelkamper, R., and Lahmar, H. B. A survey on provenance: What for? what form? what from? VLDB Journal 26 (6): 881–906, 2017.

Huq, M. R. An inference-based framework for managing data provenance. Ph.D. thesis, University of Twente, Enschede, Netherlands, 2013.

Huq, M. R., Apers, P. M. G., and Wombacher, A. ProvenanceCurious: a tool to infer data provenance from scripts. In International Conference on Extending Database Technology (EDBT). ACM, Genoa, Italy, pp. 765–768, 2013.

Juristo, N. and Moreno, A. M. Basics of Software Engineering Experimentation. Kluwer Academic Publishers, Spain, 2001.

Lerner, B. and Boose, E. R. RDataTracker: collecting provenance in an interactive scripting environment. In Workshop on the Theory and Practice of Provenance (TaPP). USENIX Association, Cologne, Germany, pp. 1–4, 2014.

Linhares, H., Pimentel, J. a. F., Kohwalter, T., and Murta, L. G. P. Provenance-Enhanced Algorithmic Debugging. In Brazilian Symposium on Software Engineering (SBES). ACM, Salvador, Brazil, pp. 203–212, 2019.

McPhillips, T., Song, T., Kolisnik, T., Aulenbach, S., Belhajjame, K., Bocinsky, K., Cao, Y., Chirigati, F., Dey, S., Freire, J., et al. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts. International Journal of Digital Curation 10 (1): 298–313, 2015.

Murta, L., Braganholo, V., Chirigati, F., Koop, D., and Freire, J. noWorkflow: Capturing and Analyzing Provenance of Scripts. In Provenance and Annotation Workshop (IPAW). Springer, Cologne, Germany, pp. 71–83, 2014.

Nassi, I. R. and Shneiderman, B. Flowchart techniques for structured programming. ACM SIGPLAN Notices 8 (8): 12–26, 1973.

Oy, C. A. Visustin flowchart generator. [link], 1997. Accessed: 2021-05-20.

Ozgur, C., Colliau, T., Rogers, G., Hughes, Z., and Bennie, E. Matlab vs. python vs. r. Journal of data science: JDS vol. 15, pp. 355–372, 07, 2017.

Pimentel, J. a. F. Provenance from Scripts. Ph.D. thesis, Universidade Federal Fluminense, Niterói, RJ, 2021.

Pimentel, J. F., Freire, J., Murta, L., and Braganholo, V. A Survey on Collecting, Managing, and Analyzing Provenance from Scripts. ACM Computing Surveys (CSUR) 52 (3): 1–38, 2019.

Pimentel, J. F., Murta, L., Braganholo, V., and Freire, J. noWorkflow: a Tool for Collecting, Analyzing, and Managing Provenance from Python Scripts. Proceedings of the VLDB Endowment vol. 10, pp. 1841–1844, 2017.

Simmhan, Y. L., Plale, B., and Gannon, D. A survey of data provenance in e-science. ACM SIGMOD Record 34 (3): 31–36, 2005.

Weintraub, P. G. The importance of publishing negative results. Journal of Insect Science 16 (1): 1–2, 2016.

Downloads

Published

2022-10-03

How to Cite

Gama Lemos, V., Pimentel, J. F., Erbisti, B., & Braganholo, V. (2022). P+RProv: Prospective+Retrospective Provenance Graphs of Python Scripts. Journal of Information and Data Management, 13(4). https://doi.org/10.5753/jidm.2022.2059

Issue

Section

Regular Papers