Exploring Architectural Solutions for Implementing the FAIR Principles in Big Data Environments
Abstract
The concept of Open Science has emerged as a major enabler for scientific collaboration. To develop repositories adhering to this concept, the FAIR Principles have been proposed. However, fulfilling these principles can be challenging when dealing with a significant volume, variety, and velocity of data and metadata. A suitable solution is to develop a Software Reference Architecture (SRA) that considers the characteristics of big data environments and the FAIR Principles. Despite the importance of this solution for Open Science, existing literature lacks a big data SRA that achieves full FAIR compliance. In our research, we address this gap by proposing architectural solutions for the implementation of FAIR-compliant big data sharing repositories. We validate these solutions through case studies and performance evaluations. Future contributions include developing algorithms to instantiate the proposed architectures and creating FAIR-compliant artificial datasets to assist in further validations.
Keywords:
Open science, PAIR principles, Software reference Architecture
References
Ataei, P. and Litchfield, A. (2021). NeoMycelia: A software reference architecture for big data systems. In Proc. APSEC, pages 452–462.
Borges, V. et al. (2022). A platform to generate FAIR data for COVID-19 clinical research in Brazil. In Proc. ICEIS, pages 218–225.
Castro, J. P. C. et al. (2022a). FAIR Principles and Big Data: A software reference architecture for Open Science. In Proc. ICEIS, pages 27–38.
Castro, J. P. C. et al. (2022b). Open Science in the cloud: The CloudFAIR architecture for FAIR-compliant repositories. In Proc. ADBIS, pages 56–66.
Chen, M., Mao, S., and Liu, Y. (2014). Big data: A survey. Mob. Netw. Appl., 19(2):171– 209.
Davoudian, A. and Liu, M. (2020). Big data systems: A software engineering perspective. ACM Comput. Surv., 53(5):1–39.
Deng, N. et al. (2022). ImmuneData: an integrated data discovery system for immunology data repositories. Database, 2022.
Kimball, R. and Ross, M. (2011). The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons.
Medeiros, C. B. et al. (2020). IAP input into the UNESCO Open Science Recommendation. Available at [link]. Accessed in April 8, 2023.
Nakagawa, E. Y., Antonino, P. O., and Becker, M. (2011). Reference architecture and product line architecture: A subtle but critical difference. In Proc. ECSA, pages 207–211.
Sawadogo, P. and Darmont, J. (2021). On data lake architectures and metadata management. J. Intell. Inf. Syst., 56(1):97–120.
Vazquez, P. et al. (2022). Globally accessible distributed data sharing (GADDS): A decentralized FAIR platform to facilitate data sharing in the life sciences. Bioinformatics, 38:3812–3817.
Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 3(1):1–9.
Borges, V. et al. (2022). A platform to generate FAIR data for COVID-19 clinical research in Brazil. In Proc. ICEIS, pages 218–225.
Castro, J. P. C. et al. (2022a). FAIR Principles and Big Data: A software reference architecture for Open Science. In Proc. ICEIS, pages 27–38.
Castro, J. P. C. et al. (2022b). Open Science in the cloud: The CloudFAIR architecture for FAIR-compliant repositories. In Proc. ADBIS, pages 56–66.
Chen, M., Mao, S., and Liu, Y. (2014). Big data: A survey. Mob. Netw. Appl., 19(2):171– 209.
Davoudian, A. and Liu, M. (2020). Big data systems: A software engineering perspective. ACM Comput. Surv., 53(5):1–39.
Deng, N. et al. (2022). ImmuneData: an integrated data discovery system for immunology data repositories. Database, 2022.
Kimball, R. and Ross, M. (2011). The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons.
Medeiros, C. B. et al. (2020). IAP input into the UNESCO Open Science Recommendation. Available at [link]. Accessed in April 8, 2023.
Nakagawa, E. Y., Antonino, P. O., and Becker, M. (2011). Reference architecture and product line architecture: A subtle but critical difference. In Proc. ECSA, pages 207–211.
Sawadogo, P. and Darmont, J. (2021). On data lake architectures and metadata management. J. Intell. Inf. Syst., 56(1):97–120.
Vazquez, P. et al. (2022). Globally accessible distributed data sharing (GADDS): A decentralized FAIR platform to facilitate data sharing in the life sciences. Bioinformatics, 38:3812–3817.
Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 3(1):1–9.
Published
2023-09-25
How to Cite
CASTRO, João P. C.; AGUIAR, Cristina D..
Exploring Architectural Solutions for Implementing the FAIR Principles in Big Data Environments. In: WORKSHOP ON THESIS AND DISSERTATION (WTDBD) - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 38. , 2023, Belo Horizonte/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 138-144.
DOI: https://doi.org/10.5753/sbbd_estendido.2023.232886.
