Exploring Architectural Solutions for Implementing the FAIR Principles in Big Data Environments

  • João P. C. Castro University of São Paulo (USP) / Federal University of Minas Gerais (UFMG)
  • Cristina D. Aguiar University of São Paulo (USP)

Abstract


The concept of Open Science has emerged as a major enabler for scientific collaboration. To develop repositories adhering to this concept, the FAIR Principles have been proposed. However, fulfilling these principles can be challenging when dealing with a significant volume, variety, and velocity of data and metadata. A suitable solution is to develop a Software Reference Architecture (SRA) that considers the characteristics of big data environments and the FAIR Principles. Despite the importance of this solution for Open Science, existing literature lacks a big data SRA that achieves full FAIR compliance. In our research, we address this gap by proposing architectural solutions for the implementation of FAIR-compliant big data sharing repositories. We validate these solutions through case studies and performance evaluations. Future contributions include developing algorithms to instantiate the proposed architectures and creating FAIR-compliant artificial datasets to assist in further validations.
Keywords: Open science, PAIR principles, Software reference Architecture

References

Ataei, P. and Litchfield, A. (2021). NeoMycelia: A software reference architecture for big data systems. In Proc. APSEC, pages 452–462.

Borges, V. et al. (2022). A platform to generate FAIR data for COVID-19 clinical research in Brazil. In Proc. ICEIS, pages 218–225.

Castro, J. P. C. et al. (2022a). FAIR Principles and Big Data: A software reference architecture for Open Science. In Proc. ICEIS, pages 27–38.

Castro, J. P. C. et al. (2022b). Open Science in the cloud: The CloudFAIR architecture for FAIR-compliant repositories. In Proc. ADBIS, pages 56–66.

Chen, M., Mao, S., and Liu, Y. (2014). Big data: A survey. Mob. Netw. Appl., 19(2):171– 209.

Davoudian, A. and Liu, M. (2020). Big data systems: A software engineering perspective. ACM Comput. Surv., 53(5):1–39.

Deng, N. et al. (2022). ImmuneData: an integrated data discovery system for immunology data repositories. Database, 2022.

Kimball, R. and Ross, M. (2011). The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons.

Medeiros, C. B. et al. (2020). IAP input into the UNESCO Open Science Recommendation. Available at [link]. Accessed in April 8, 2023.

Nakagawa, E. Y., Antonino, P. O., and Becker, M. (2011). Reference architecture and product line architecture: A subtle but critical difference. In Proc. ECSA, pages 207–211.

Sawadogo, P. and Darmont, J. (2021). On data lake architectures and metadata management. J. Intell. Inf. Syst., 56(1):97–120.

Vazquez, P. et al. (2022). Globally accessible distributed data sharing (GADDS): A decentralized FAIR platform to facilitate data sharing in the life sciences. Bioinformatics, 38:3812–3817.

Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 3(1):1–9.
Published
2023-09-25
CASTRO, João P. C.; AGUIAR, Cristina D.. Exploring Architectural Solutions for Implementing the FAIR Principles in Big Data Environments. In: WORKSHOP ON THESIS AND DISSERTATION (WTDBD) - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 138-144. DOI: https://doi.org/10.5753/sbbd_estendido.2023.232886.