Big Spatial Data Integration and Enrichment with Provenance Control

Abstract


In the last few years, an increasing number of devices generated vast amounts of data, commonly called Big Data. This phenomenon brought many opportunities - and challenges - in terms of knowledge discovery, as distributed and heterogeneous data may be combined and used to create high-quality models of events and phenomena. Although, the data integration and the transformations over it bring questions about integrity, quality and veracity. Our investigation aims to create a generic model to integrate the data allowing the data enrichment while maintaining provenance information.

Keywords: Big data, Spatial data, Data integration, Provenance, Data Provenance

References

Abadi, D., Ailamaki, A., Andersen, D., Bailis, P., Balazinska, M., Bernstein, P., Boncz, P., Chaudhuri, S., Cheung, A., Doan, A. H., Dong, L., Franklin, M. J., Freire, J., Halevy, A., Hellerstein, J. M., Idreos, S., Kossmann, D., Kraska, T., Krishnamurthy, S., Markl, V., Melnik, S., Milo, T., Mohan, C., Neumann, T., Ooi, B. C., Ozcan, F., Patel, J., Pavlo, A., Popa, R., Ramakrishnan, R., Ré, C., Stonebraker, M., and Suciu, D. (2020). The Seattle Report on Database Research. SIGMOD Record, 48(4):44-53.

Arab, B. S., Feng, S., Glavic, B., Lee, S., Niu, X., and Zeng, Q. (2018). Gprom-A swiss army knife for your provenance needs. IEEE Data Engineering Bulletin, 41(1):51-62.

Benbasat, I., Goldstein, D. K., and Mead, M. (1987). The case research strategy in studies of information systems. MIS Q., 11:369-386.

Buneman, P., Khanna, S., Tan, W.-C., and Chiew, W. (2001). Why and where: A characterization of data provenance. Computer Science, 1973:316-330.

Buneman, P. and Tan, W. C. (2018). Data provenance: What next? SIGMOD Record, 47(3):5-16.

Cheney, J., Chiticariu, L., and Tan, W. C. (2007). Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1:379-474.

Closa, G., Maso, J., Proß, B., and Pons, X. (2017). W3C PROV to describe provenance at the dataset, feature and attribute levels in a distributed environment. Computers, Environment and Urban Systems, 64(July):103-117.

Glavic, B. and Alonso, G. (2009). Perm: Processing provenance and data on the same data model through query rewriting. In Proceedings of the International Conference on Data Engineering, pages 174-185, Shanghai, China. IEEE.

Pintor, P., Costa, R., and Moreira, J. (2022a). Provenance in spatial queries. In 26th International Database Engineered Applications Symposium-IDEAS 2022.

Pintor, P., Costa, R. L. d. C., and Moreira, J. (2022b). Why-and how-provenance in distributed environments. In Strauss, C., Cuzzocrea, A., Kotsis, G., Tjoa, A. M., and Khalil, I., editors, Database and Expert Systems Applications, pages 103-115, Cham. Springer International Publishing.

Senellart, P. (2017). Provenance and probabilities in relational databases: From theory to practice. SIGMOD Record, 46:5-15. 7, 5.

Senellart, P., Jachiet, L., Maniu, S., and Ramusat, Y. (2018). ProvSQL: Provenance and probability management in PostgreSQL. Proceedings of the VLDB Endowment, 11(12):2034-2037.

Wang, Y., Dos Reis, J. C., Borggren, K. M., Vaz Salles, M. A., Medeiros, C. B., and Zhou, Y. (2019). Modeling and building IoT data platforms with actor-oriented databases. Advances in Database Technology-EDBT, 2019-March(1):512-523.
Published
2022-09-19
PINTOR, Paulo; MOREIRA, José; COSTA, Rogério Luís de Carvalho. Big Spatial Data Integration and Enrichment with Provenance Control. In: WORKSHOP ON THESIS AND DISSERTATION (WTDBD) - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 37. , 2022, Búzios. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 140-146. DOI: https://doi.org/10.5753/sbbd_estendido.2022.21856.