Managing Hypothesis of Scientific Experiments with PhenoManager
Keywords:Scientific Experiment, Phenomena, Hypothesis, Scientific Workflows, Project Management
Scientific research based on computer simulations is complex since it may involve managing the enormous volumes of data and metadata produced during the life cycle of a scientific experiment, from the formulation of hypotheses to its final evaluation. This wealth of data needs to be structured and managed in a way that makes sense to scientists so that relevant knowledge can be extracted to contribute to the scientific research process. In addition, when it comes to the scope of the scientific project as a whole, it may be associated with several different scientific experiments, which in turn may require executions of different scientific workflows, which makes the task rather arduous. All of this can become even more difficult if we consider that the project tasks must be associated with the execution of such simulations (which may take hours or even days), that the hypotheses of a phenomenon need validation and replication, and that the project team may be geographically dispersed. This article presents an approach called PhenoManager that aims at helping scientists managing their scientific projects and the cycle of the scientific method as a whole. PhenoManager can assist the scientist in structuring, validating, and reproducing hypotheses of a phenomenon through configurable computational models in the approach. For the evaluation of this article was used SciPhy, a scientific workflow in the field of bioinformatics, concluding that the proposed approach brings gains without considerable performance losses.
Ongoing and future developments at the universal protein resource. Nucleic Acids Res. 39 (Database-Issue): 214–219, 2011.
Allen, A., Aragon, C. R., Becker, C., Carver, J. C., Chis, A., Combemale, B., Croucher, M., Crowston, K., Garijo, D., Gehani, A., Goble, C. A., Haines, R., Hirschfeld, R., Howison, J., Huff, K. D., Jay, C., Katz, D. S., Kirchner, C., Kuksenok, K., Lämmel, R., Nierstrasz, O., Turk, M. J., van Nieuwpoort, R., Vaughn, M., and Vinju, J. J. Engineering academic software (dagstuhl perspectives workshop 16252). Dagstuhl Manifestos 6 (1): 1–20, 2017.
Bass, L., Clements, P., and Kazman, R. Software Architecture in Practice, Second Edition. Addison-Wesley Professional, 2003.
Coutinho, F., Ogasawara, E. S., de Oliveira, D., Braganholo, V., Lima, A. A. B., Dávila, A. M. R., and Mattoso, M. Many task computing for orthologous genes identification in protozoan genomes using hydra. Concurr. Comput. Pract. Exp. 23 (17): 2326–2337, 2011.
Davis, F. D. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, 1989.
Dayibas, O., Oguztüzün, H., and Yilmaz, L. On the use of model-driven engineering principles for the management of simulation experiments. J. Simulation 13 (2): 83–95, 2019.
de Oliveira, D., Cunha, L., Tomaz, L., Pereira, V., and Mattoso, M. Using ontologies to support deep water oil exploration scientific workflows. In 2009 IEEE Congress on Services, Part I, SERVICES I 2009, Los Angeles, CA, USA, July 6-10, 2009. IEEE Computer Society, pp. 364–367, 2009.
de Oliveira, D., Ocaña, K. A. C. S., Baião, F. A., and Mattoso, M. A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10 (3): 521–552, 2012.
de Oliveira, D., Ocaña, K. A. C. S., Ogasawara, E. S., Dias, J., de A. R. Gonçalves, J. C., Baião, F. A., and Mattoso, M. Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows. Future Gener. Comput. Syst. 29 (7): 1816–1825, 2013.
de Oliveira, D., Ogasawara, E. S., Chirigati, F. S., Silva, V., Murta, L. G. P., and Mattoso, M. Gexpline: A tool for supporting experiment composition. In Provenance and Annotation of Data and Processes - Third International Provenance and Annotation Workshop, IPAW 2010, Troy, NY, USA, June 15-16, 2010. Revised Selected Papers, D. L. McGuinness, J. Michaelis, and L. Moreau (Eds.). Lecture Notes in Computer Science, vol. 6378. Springer, pp. 251–259, 2010.
de Oliveira, D. C. M., Liu, J., and Pacitti, E. Data-Intensive Workflow Management: For Clouds and Data-Intensive and Scalable Computing Environments. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2019.
de Souza, I. E., Oliveira, P. H. L., Bispo, E. L., Inocencio, A. C. G., and Parreira, P. A. TESE - an information system for management of experimental software engineering projects. In Proceedings of the Brazilian Symposium on Information Systems, S. W. M. Siqueira and S. T. Carvalho (Eds.). ACM, pp. 563–570, 2015.
Deelman, E., Gannon, D., Shields, M., and Taylor, I. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems 25 (5): 528–540, 2009.
Deelman, E., Mehta, G., Singh, G., Su, M.-H., and Vahi, K. Pegasus: mapping large-scale workflows to distributed resources. In Workflows for e-Science. Springer, pp. 376–394, 2007.
Fishbein, M. and Ajzen, I. Understanding attitudes and predicting social behavior, 1980.
Freire, J., Koop, D., Santos, E., and Silva, C. T. Provenance for computational tasks: A survey. Computing in Science & Engineering 10 (3), 2008.
Gesing, S., Dahan, M., Zentner, M. G., Wilkins-Diehr, N., and Lawrence, K. A. The science gateways community institute: Collaborations and efforts on international scale. Future Gener. Comput. Syst. vol. 101, pp. 951–958, 2019.
Goble, C. A., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D. T., Newman, D. R., Borkum, M., Bechhofer, S., Roos, M., Li, P., and Roure, D. D. myexperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res. 38 (Web-Server-Issue): 677–682, 2010.
Goble, C. A., Soiland-Reyes, S., and Bechhofer, S. Research object community update. In Proceedings of Workshop on Research Objects (RO2018), Amsterdam, The Netherlands, October 29, 2018, 2018.
Gonçalves, B. and Porto, F. Research lattices: towards a scientific hypothesis data model. In Conference on Scientific and Statistical Database Management, SSDBM ’13, Baltimore, MD, USA, July 29 - 31, 2013, A. Szalay, T. Budavari, M. Balazinska, A. Meliou, and A. Sacan (Eds.). ACM, pp. 41:1–41:4, 2013.
Gonçalves, B. and Porto, F. Managing scientific hypotheses as data with support for predictive analytics. Computing in Science Engineering 17 (5): 35–43, 2015.
Hey, T., Gannon, D., and Pinkelman, J. The future of data-intensive science. IEEE Computer 45 (5): 81–82, 2012.
Holl, S., Garijo, D., Belhajjame, K., Zimmermann, O., Giovanni, R. D., Obst, M., and Goble, C. A. On specifying and sharing scientific workflow optimization results using research objects. In WORKS 2013, Denver, CO, USA, November 17, 2013. pp. 28–37, 2013.
Karau, H., Konwinski, A., Wendell, P., and Zaharia, M. Learning spark: lightning-fast big data analysis. "O’Reilly Media, Inc.", 2015.
Kaushik, G., Ivkovic, S., Simonovic, J., Tijanic, N., Davis-Dusenbery, B., and Kural, D. Rabix: An open-source workflow executor supporting recomputability and interoperability of workflow descriptions. In Biocomputing 2017: Proceedings of the Pacific Symposium, Kohala Coast, Hawaii, USA, January 3-7, 2017, R. B. Altman, A. K. Dunker, L. Hunter, M. D. Ritchie, and T. E. Klein (Eds.). pp. 154–165, 2017.
Kluge, M. and Friedel, C. C. Watchdog - a workflow management system for the distributed analysis of large-scale experimental data. BMC Bioinform. 19 (1): 97:1–97:13, 2018.
Lepperød, M. E., Dragly, S., Buccino, A. P., Mobarhan, M. H., Malthe-Sørenssen, A., Hafting, T., and Fyhn, M. Experimental pipeline (expipe): A lightweight data management platform to simplify the steps from experiment to data analysis. Frontiers Neuroinformatics vol. 14, pp. 30, 2020.
Marinho, A., de Oliveira, D., Ogasawara, E. S., Silva, V., Ocaña, K. A. C. S., Murta, L., Braganholo, V., and Mattoso, M. Deriving scientific workflows from algebraic experiment lines: A practical approach. Future Gener. Comput. Syst. vol. 68, pp. 111–127, 2017.
Mattoso, M., Werner, C., Travassos, G., Braganholo, V., and Murta, L. Gerenciando experimentos científicos em larga escala. SBC-SEMISH vol. 8, pp. 121–135, 2008.
Mattoso, M., Werner, C., Travassos, G. H., Braganholo, V., Ogasawara, E. S., de Oliveira, D., da Cruz, S. M. S., Martinho, W., and Murta, L. Towards supporting the life cycle of large scale scientific experiments. Int. J. Bus. Process. Integr. Manag. 5 (1): 79–92, 2010.
Newman, S. Building microservices: designing fine-grained systems. " O’Reilly Media, Inc.", 2015.
Ocaña, K. A., de Oliveira, D., Ogasawara, E., Dávila, A. M., Lima, A. A., and Mattoso, M. Sciphy: a cloud-based workflow for phylogenetic analysis of drug targets in protozoan genomes. In BSB11. Springer, pp. 66–70, 2011.
Ocaña, K. A. C. S., de Oliveira, D., Dias, J., Ogasawara, E. S., and Mattoso, M. Designing a parallel cloud based comparative genomics workflow to improve phylogenetic analyses. Future Gener. Comput. Syst. 29 (8):2205–2219, 2013.
Ocaña, K. A. C. S., Galheigo, M., Osthoff, C., Jr., L. M. R. G., Porto, F., Gomes, A. T. A., de Oliveira, D., and de Vasconcelos, A. T. R. Bioinfoportal: A scientific gateway for integrating bioinformatics applications on the brazilian national high-performance computing network. Future Gener. Comput. Syst. vol. 107, pp. 192–214, 2020.
Ocaña, K. A., de Oliveira, D., Dias, J., Ogasawara, E., and Mattoso, M. Designing a parallel cloud based comparative genomics workflow to improve phylogenetic analyses. Future Generation Computer Systems 29 (8): 2205–2219, 2013. Including Special sections: Advanced Cloud Monitoring Systems The fourth IEEE International Conference on e-Science 2011 — e-Science Applications and Tools Cluster, Grid, and Cloud Computing.
Page, K. R., Palma, R., Holubowicz, P., Klyne, G., Soiland-Reyes, S., Cruickshank, D., González-Cabero, R., García-Cuesta, E., Roure, D. D., Zhao, J., and Gómez-Pérez, J. M. From workflows to research objects: An architecture for preserving the semantics of science. In Proceedings of the Second International Workshop on Linked Science 2012 - Tackling Big Data, Boston, MA, USA, November 12, 2012, T. Kauppinen, L. C. Pouchard, and C. Keßler (Eds.). CEUR Workshop Proceedings, vol. 951. CEUR-WS.org, 2012.
Pardi, S. and Russo, G. A big data approach for multi-experiment data management. Int. J. Grid Util. Comput. 10 (2): 159–167, 2019.
Pinheiro, A. A. A., Siani, A. A. C., Guilhermino, J. d. F. A., Henriques, M. d. G. A. M. d. O., Quental, C. M., and Pizarro, A. P. B. Metodologia para gerenciar projetos de pesquisa e desenvolvimento com foco em produtos: uma proposta. Revista de AdministraÃS ÃPÃ vol. 40, pp. 457 – 478, 06, 2006.
Porto, F., Costa, R. G., de Carvalho Moura, A. M., and Gonçalves, B. Modeling and implementing scientific hypothesis. J. Database Manag. 26 (2): 1–13, 2015.
Porto, F., Rittmeyer, J. N., Ogasawara, E. S., Krone-Martins, A., Valduriez, P., and Shasha, D. E. Point pattern search in big data. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management, SSDBM 2018, Bozen-Bolzano, Italy, July 09-11, 2018, D. Sacharidis, J. Gamper, and M. H. Böhlen (Eds.). ACM, pp. 21:1–21:12, 2018.
Ramos, L., Ocaña, K., Oliveira, D., Porto, F., and de Oliveira, D. Phenomanager: um sistema de gerência de hipóteses de fenômenos científicos. In Anais estendidos do XXXIV Simpósio Brasileiro de Bancos de Dados. SBC, Porto Alegre, RS, Brasil, 2019.
Ramos, L. S., Ocaña, K. A., and de Oliveira, D. Um sistema de informação para gerência de projetos científicos baseados em simulações computacionais. In Anais do XII Simpósio Brasileiro de Sistemas de Informação. SBC, pp. 216–223, 2016.
Roure, D. D., Goble, C. A., Aleksejevs, S., Bechhofer, S., Bhagat, J., Cruickshank, D., Fisher, P., Hull, D., Michaelides, D. T., Newman, D. R., Procter, R., Lin, Y., and Poschen, M. Towards open science: the myexperiment approach. Concurr. Comput. Pract. Exp. 22 (17): 2335–2353, 2010.
Travassos, G. H. and Barros, M. O. Contributions of in virtuo and in silico experiments for the future of empirical studies in software engineering. In 2nd Workshop on Empirical Software Engineering the Future of Empirical Studies in Software Engineering. pp. 117–130, 2003.
Vaquero, L. M., Rodero-Merino, L., Caceres, J., and Lindner, M. A break in the clouds: Towards a cloud definition. SIGCOMM Comput. Commun. Rev. 39 (1): 50–55, Dec., 2008.
Walpole, R. E., Myers, R. H., Myers, S. L., and Ye, K. Probability & statistics for engineers and scientists. Pearson Education, Upper Saddle River, 2007.
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., and Stoica, I. Apache spark: a unified engine for big data processing. Commun. ACM 59 (11): 56–65, 2016.