Towards an Open Science-Based Framework for Software Engineering Controlled (Quasi-)Experiments
Experimental Software Engineering has straightforwardly evolved in the last decades due to the effort of the community in providing consolidated training, teaching and practice. Particularly, for controlled experiments and quasi-experiments, the software engineering community has discussed on the lack of reproducibility and the missing of experimental artifacts sharing policies, such as, dataset, baselines, metamodels, repositories, and scripts. These are, therefore, important issues that jeopardizes controlled experimentation to evolve as rigorous as in millennial sciences as Medicine and Physics. In this ongoing work, it is presented a proposal of a conceptual framework for software engineering controlled experiments and quasi-experiments based on the main principles and practices of Open Science. It is understood that Open Science is one of the pillars to the evolution of science, consequently, to software engineering. The FAIR data, metadata, repositories, curation and provenance are some of the main practices discussed in this paper. Ongoing activities are described, in terms of how they are being performed and their relationship with prospective ones.
Anchundia, C. E. et al. (2020). Resources for reproducibility of experiments in empirical software engineering: Topics derived from a secondary study. IEEE Access, 8:8992– 9004.
Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604).
Cordasco, G., Malandrino, D., Pirozzi, D., Scarano, V., and Spagnuolo, C. (2018). A layered architecture for open data: Design, implementation and experiences. In International Conference on Theory and Practice of Electronic Governance (ICEGOV), pages 371–381.
Cordeiro, A. F. and OliveiraJr, E. (2021). Open science practices for software engineering controlled experiments and quasi-experiments. In Workshop de Práticas de Ciência Aberta para Engenharia de Software (OpenScienSE), pages 19–21.
Costa, G. C. B., Werner, C., Braga, R., Dalpra, H., Araújo, M. A., and Ströele, V. (2019). Deriving strategic information for software development processes using provenance data and ontology techniques. International Journal of Business Process Integration and Management (IJBPIM), 9(3):170–196.
Damasceno, C., Melo, I., and Strüber, D. (2021). Towards multi-criteria prioritization of best practices in research artifact sharing. In Workshop de Práticas de Ciência Aberta para Engenharia de Software (OpenScienSE), pages 1–6. Sociedade Brasileira de Computação (SBC).
Ernst, N. A., Carver, J. C., Mendez, D., and Torchiano, M. (2021). Understanding peer review of software engineering papers. Empirical Software Engineering, 26(5):1–29.
Esteva, M., Sweat, S., McLay, R., Xu, W., and Kulasekaran, S. (2016). Data curation with a focus on reuse. In Joint Conference on Digital Libraries (JCDL), pages 45–54. IEEE.
Felderer, M. and Travassos, G. H. (2020). Contemporary Empirical Methods in Software Engineering. Springer.
Freund, G. P., Sembay, M. J., and Macedo, D. D. J. (2019). Data provenance and security of information: Interdisciplinary relations in the field of information science. Revista Ibero-Americana de Ciência da Informação (RICI), 24(2):825–807.
Furtado, V., OliveiraJr, E., and Kalinowski, M. (2021). Guidelines for promoting software product line experiments. In Brazilian Symposium on Software Components, Architectures, and Reuse (SBCARS), pages 31–40.
González-Barahona, J. M. and Robles, G. (2012). On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empirical Software Engineering, 17(1):75–89.
Jedlitschka, A., Ciolkowski, M., and Pfahl, D. (2008). Reporting experiments in software engineering. In Guide to Advanced Empirical Software Engineering, pages 201–228. Springer.
Karanastasis, E., Andronikou, V., Chondrogiannis, E., Tsatsaronis, G., Eisinger, D., and Petrova, A. (2014). The opensciencelink architecture for novel services exploiting open access data in the biomedical domain. In Panhellenic Conference on Informatics (PCI), pages 1–6.
Medicine and others, N. A. o. S. (2018). Open Science by Design: Realizing a Vision for 21st Century Research. National Academies Press.
Mendez, D., Graziotin, D., Wagner, S., and Seibold, H. (2020). Open science in software engineering. In Contemporary Empirical Methods in Software Engineering, pages 477–501. Springer.
Nelson, N. C., Ichikawa, K., Chung, J., and Malik, M. M. (2021). Mapping the discursive dimensions of the reproducibility crisis: A mixed methods analysis. PLOS One, 16(7):e0254090.
OliveiraJr, E., Furtado, V., Vignando, H., Luz, C., Cordeiro, A., Steinmacher, I., and Zorzo, A. (2021). Towards improving experimentation in software engineering. In Brazilian Symposium on Software Engineering (SBES), pages 335–340.
Pontika, N., Knoth, P., Cancellieri, M., and Pearce, S. (2015). Fostering open science to research using a taxonomy and an elearning portal. In International Conference on Knowledge Technologies and Data-driven Business, pages 1–8.
Resende, Lilian e Bax, M. (2020). Scientific data curation in information science: National scenario survey. AtoZ: novas práticas em informação e conhecimento, 9(1).
Rocha, D. G. and Gouveia, L. M. B. (2020). Digital content curation for distance education: Quality, updating and teaching skills. In Iberian Conference on Information Systems and Technologies (CISTI), pages 1–4. IEEE.
Rousseau, G., Di Cosmo, R., and Zacchiroli, S. (2020). Software provenance tracking at the scale of public source code. Empirical Software Engineering, 25(4):2930–2959.
Santos, A. C., Pereira, Á. J., Oliveira, M. R., Macedo, H. T., and Nascimento, R. P. (2018). Building software products with use open data and big data in smart cities. In Euro American Conference on Telematics and Information Systems (EATIS), pages 1–7.
Shull, F., Singer, J., and Sjøberg, D. I. (2007). Guide to Advanced Empirical Software Engineering. Springer.
Silva, C. (2011). Captura de dados de proveniência de workflows científicos em nuvens computacionais. Master’s thesis, Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa em Engenharia Universidade Federal do Rio de Janeiro (COPPE/UFRJ). in Portuguese.
Timperley, C. S., Herckis, L., Le Goues, C., and Hilton, M. (2021). Understanding and improving artifact sharing in software engineering research. Empirical Software Engineering, 26:1–41.
Tsai, W.-T., Wei, X., Chen, Y., Paul, R., Chung, J.-Y., and Zhang, D. (2007). Data provenance in soa: Security, reliability, and integrity. Service Oriented Computing and Applications, 1(4):223–247.
Vignando, H., Furtado, V. R., Teixeira, L. O., and OliveiraJr, E. (2020). OntoExper-SPL: An ontology for software product line experiments. In International Conference on Enterprise Information Systems (ICEIS), pages 401–408.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. (2012). Experimentation in Software Engineering. Springer Science & Business Media.
Yuan, D., Yang, Y., and Chen, J. (2013). 2 literature review. In Yuan, D., Yang, Y., and Chen, J., editors, Computation and Storage in the Cloud, pages 5–13. Elsevier.