Stream and Historical Data Integration using SQL as Standard Language

Resumo


The complexity imposed by data heterogeneity makes it difficult to integrate 'streaming x streaming' and 'streaming x historical' data types. For practical analysis, the enrichment and contextualization process based on historical and streaming data would benefit from approaches that facilitate data integration, abstracting details and formats of the primary sources. This work presents a framework that allows the integration of streaming data and historical data in real-time, abstracting syntactic aspects of queries through the use of SQL as a standard language for querying heterogeneous sources. The framework was evaluated through an experiment using a relational database and real data produced by sensors. The results point to the feasibility of the approach.
Palavras-chave: Data Integration, Streaming Data, Historical Data, SQL, Continuous Query

Referências

Abu-Salih, B., Wongthongtham, P., Zhu, D., Chan, K. Y., Rudra, A., Abu-Salih, B.,Wongthongtham, P., Zhu, D., Chan, K. Y., and Rudra, A. (2021). Social big data: An overview and applications. Social Big Data Analytics: Practices, Techniques, and Applications, pages 1–14.

Akanbi, A. and Masinde, M. (2020). A distributed stream processing middleware framework for real-time analysis of heterogeneous data on big data platform: Case of environmental monitoring. Sensors, 20(11):3166.

Alkhamisi, A. O. and Saleh, M. (2020). Ontology opportunities and challenges: Discussions from semantic data integration perspectives. In 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), pages 134–140. IEEE.

Analytics, M. (2016). The age of analytics: competing in a data-driven world. McKinsey Global Institute Research.

Asano, Y., Herr, D.-F., Ishihara, Y., Kato, H., Nakano, K., Onizuka, M., and Sasaki,Y. (2019). Flexible framework for data integration and update propagation: Systemaspect. In 2019 IEEE International Conference on Big Data and Smart Computing(BigComp), pages 1–5.

Barros, M. (2020). Book review: Digital objects, digital subjects: Interdisciplinary perspectives on capitalism, labour and politics in the age of big data.

Brown, K. S., Spivak, D. I., and Wisnesky, R. (2019). Categorical data integration for computational science. Computational Materials Science, 164:127–132.

Caldiera, V. R. B.-G. and Rombach, H. D. (1994). Goal question metric paradigm.Encyclopedia of software engineering, 1:528–532.

Cappuzzo, R., Papotti, P., and Thirumuruganathan, S. (2020). Creating embeddings of heterogeneous relational datasets for data integration tasks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 1335–1349.

Cavallo, G., Di Mauro, F., Pasteris, P., Sapino, M. L., and Candan, K. S. (2018).Contextually-enriched querying of integrated data sources. In 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), pages 9–16. IEEE.

Costa, F. S., Nassar, S. M., Gusmeroli, S., Schultz, R., Conceição, A. G., Xavier, M.,Hessel, F., and Dantas, M. A. (2020). Fasten iiot: An open real-time platform for vertical, horizontal and end-to-end integration. Sensors, 20(19):5499.

Coulouris, G., Dollimore, J., Kindberg, T., and Blair, G. (2005). Distributed Systems: Concepts and Design. Pearson Education, 5th edition.

dos Santos, R. P. (2016). Managing and monitoring software ecosystem to support demand and solution analysis. PhD thesis, Universidade Federal do Rio de Janeiro.

Freitas, A. and Curry, E. (2014). Natural language queries over heterogeneous linked datagraphs: A distributional-compositional semantics approach. In Proceedings of the 19th international conference on Intelligent User Interfaces, pages 279–288.

Garofalakis, M., Gehrke, J., and Rastogi, R., editors (2016). Data Stream Management. Springer Berlin Heidelberg.

Ghasemaghaei, M. and Calic, G. (2020). Assessing the impact of big data on firm innovation performance: Big data is not always better data. Journal of Business Research,108:147–162.

Kiran, M., Murphy, P., Monga, I., Dugan, J., and Baveja, S. S. (2015). Lambda architecture for cost-effective batch and speed big data processing. In 2015 IEEE International Conference on Big Data (Big Data), pages 2785–2792. IEEE.

María Cavanillas, J., Curry, E., and Wahlster, W. (2016). New horizons for a data-driven economy: a roadmap for usage and exploitation of big data in Europe. Springer Nature.

Mikalef, P., Pappas, I., Krogstie, J., and Pavlou, P. A. (2020). Big data and business analytics: A research agenda for realizing business value. Elsevier.

Miller, R. J. (2018). Open data integration. Proceedings of the VLDB Endowment,11(12):2130–2139.

Shan, S., Luo, Y., Zhou, Y., and Wei, Y. (2019). Big data analysis adaptation and enterprises’ competitive advantages: the perspective of dynamic capability and resource-based theories.Technology Analysis & Strategic Management, 31(4):406–420.

Shein, A. and Chrysanthis, P. K. (2020). Multi-query optimization of incrementally evaluated sliding-window aggregations. IEEE Transactions on Knowledge and Data Engineering.

Stonebraker, M. and Ilyas, I. F. (2018). Data integration: The current status and the wayforward. IEEE Data Eng. Bull., 41(2):3–9.

Tatbul, N. (2010). Streaming data integration: Challenges and opportunities. In 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pages 155–158. IEEE.

Tian, A., Sequeda, J. F., and Miranker, D. P. (2013). Qodi: Query as context in automatic data integration. In International Semantic Web Conference, pages 624–639. Springer.

Toman, S. H. (2017). The design of a templating language to embed database queries into documents. Journal of Education College Wasit University, 1(29):512–534.

Tu, D. Q., Kayes, A., Rahayu, W., and Nguyen, K. (2020). Iot streaming data integration from multiple sources. Computing, 102(10):2299–2329.

Wang, J., Yang, Y., Wang, T., Sherratt, R. S., and Zhang, J. (2020). Big data service architecture: a survey. Journal of Internet Technology, 21(2):393–405.

Wang, X., Haas, L., and Meliou, A. (2018). Explaining data integration. Data Engineering Bulletin, 41(2).
Publicado
04/10/2021
AMARÁ, Jefferson; STRÖELE, Victor; BRAGA, Regina; DANTAS, Mário; BAUER, Michael. Stream and Historical Data Integration using SQL as Standard Language. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 193-204. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2021.17877.