Integrating Heterogeneous Stream and Historical Data Sources using SQL

Authors

  • Jefferson Amará Federal University of Juiz de Fora
  • Victor Ströele Federal University of Juiz de Fora
  • Regina Braga Federal University of Juiz de Fora
  • Mário Dantas Federal University of Juiz de Fora
  • Michael Bauer University of Western Ontario

DOI:

https://doi.org/10.5753/jidm.2022.2488

Keywords:

Data Integration, Heterogeneous Data, Historical Data, Streaming Data, Structured Query Language (SQL)

Abstract

Applications capable of integrating data from historical and streaming sources can make the most contextualized and enriched decision-making. However, the complexity of data integration over heterogeneous data sources can be a hard task for querying in this context. Approaches that facilitate data integration, abstracting details and formats of the primary sources can meet these needs. This work presents a framework that allows the integration of streaming and historical data in real-time, abstracting syntactic aspects of queries through the use of SQL as a standard language for querying heterogeneous sources. The framework was evaluated through an experiment using relational datasets and real data produced by sensors. The results point to the feasibility of the approach.

Downloads

Download data is not yet available.

References

Akanbi, A. and Masinde, M. A distributed stream processing middleware framework for real-time analysis of heterogeneous data on big data platform: Case of environmental monitoring. Sensors 20 (11): 3166, 2020.

Alkhamisi, A. O. and Saleh, M. Ontology opportunities and challenges: Discussions from semantic data integration perspectives. In 2020 6th Conference on Data Science and Machine Learning Applications (CDMA). IEEE, pp. 134–140, 2020.

Amará, J., Ströele, V., Braga, R., Dantas, M., and Bauer, M. Stream and historical data integration using SQL as standard language. In Anais do XXXVI Simpósio Brasileiro de Banco de Dados (SBBD 2021). Sociedade Brasileira de Computação - SBC, 2021.

Analytics, M. The age of analytics: competing in a data-driven world. McKinsey Global Institute Research, 2016.

Asano, Y., Herr, D.-F., Ishihara, Y., Kato, H., Nakano, K., Onizuka, M., and Sasaki, Y. Flexible framework for data integration and update propagation: System aspect. In 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). pp. 1–5, 2019.

Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M. J., and Lemire, D. Apache calcite. In Proceedings of the 2018 International Conference on Management of Data. ACM, 2018.

Brown, K. S., Spivak, D. I., and Wisnesky, R. Categorical data integration for computational science. Computational Materials Science vol. 164, pp. 127–132, 2019.

Cappuzzo, R., Papotti, P., and Thirumuruganathan, S. Creating embeddings of heterogeneous relational datasets for data integration tasks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. pp. 1335–1349, 2020.

Cavallo, G., Di Mauro, F., Pasteris, P., Sapino, M. L., and Candan, K. S. Contextually-enriched querying of integrated data sources. In 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW). IEEE, pp. 9–16, 2018.

Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S., and Zhou, X. Big data challenge: a data management perspective. Frontiers of computer Science 7 (2): 157–164, 2013.

Costa, F. S., Nassar, S. M., Gusmeroli, S., Schultz, R., Conceição, A. G., Xavier, M., Hessel, F., and Dantas, M. A. Fasten iiot: An open real-time platform for vertical, horizontal and end-to-end integration. Sensors 20 (19): 5499, 2020.

Coulouris, G., Dollimore, J., Kindberg, T., and Blair, G. Distributed Systems: Concepts and Design. Pearson Education, 2005.

de Souza Campos, V. V., Brancher, J. D., Farias, F. P., Mioni, J. L. V. M., and Brahim, P. L. G. Review and comparison of works on heterogeneous data and semantic analysis in big data. Semina: Ciências Exatas e Tecnológicas 42 (1): 113–128, 2021.

Dividino, R., Soares, A., Matwin, S., Isenor, A. W., Webb, S., and Brousseau, M. Semantic integration of real-time heterogeneous data streams for ocean-related decision making. Defence Research and Development Canada=Recherche et développement pour la . . . , 2018.

dos Santos, R. P. Managing and monitoring software ecosystem to support demand and solution analysis. Ph.D. thesis, Universidade Federal do Rio de Janeiro, 2016.

Fathy, N., Gad, W., and Badr, N. A unified access to heterogeneous big data through ontology-based semantic integration. In 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS). IEEE, pp. 387–392, 2019.

Galhotra, S., Shanmugam, K., Sattigeri, P., and Varshney, K. R. Fair data integration. arXiv preprint arXiv:2006.06053 , 2020.

Gama, J. A survey on learning from data streams: current and future trends. Progress in Artificial Intelligence 1 (1): 45–55, 2012.

Garofalakis, M., Gehrke, J., and Rastogi, R., editors. Data Stream Management. Springer Berlin Heidelberg, 2016.

Golab, L. and Özsu, M. T. Issues in data stream management. ACM Sigmod Record 32 (2): 5–14, 2003.

Grand, A., Geda, E., Mignone, A., Bertotti, A., and Fiori, A. One tool to find them all: a case of data integration and querying in a distributed lims platform. Database vol. 2019, 2019.

Gurusamy, V., Kannan, S., and Nandhini, K. The real time big data processing framework: Advantages and limitations. International Journal of Computer Sciences and Engineering 5 (12): 305–312, 2017.

Jiang, Q. and Chakravarthy, S. Stream data processing: a quality of service perspective, 2009.

Kiran, M., Murphy, P., Monga, I., Dugan, J., and Baveja, S. S. Lambda architecture for cost-effective batch and speed big data processing. In 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp. 2785–2792, 2015.

Kolajo, T., Daramola, O., and Adebiyi, A. Big data stream analysis: a systematic literature review. Journal of Big Data 6 (1): 1–30, 2019.

Mikalef, P., Pappas, I., Krogstie, J., and Pavlou, P. A. Big data and business analytics: A research agenda for realizing business value. Elsevier, 2020.

Miller, R. J. Open data integration. Proceedings of the VLDB Endowment 11 (12): 2130–2139, 2018.

Ostrowski, D., Rychtyckyj, N., MacNeille, P., and Kim, M. Integration of big data using semantic web technologies. In 2016 IEEE Tenth International Conference on Semantic Computing (ICSC). IEEE, pp. 382–385, 2016.

Sakr, S., Bajaber, F., Barnawi, A., Altalhi, A., Elshawi, R., and Batarfi, O. Big data processing systems: state-of-the-art and open challenges. In 2015 International Conference on Cloud Computing (ICCC). IEEE, pp. 1–8, 2015.

Shan, S., Luo, Y., Zhou, Y., and Wei, Y. Big data analysis adaptation and enterprises’ competitive advantages: the perspective of dynamic capability and resource-based theories. Technology Analysis & Strategic Management 31 (4): 406–420, 2019.

Shein, A. and Chrysanthis, P. K. Multi-query optimization of incrementally evaluated sliding-window aggregations. IEEE Transactions on Knowledge and Data Engineering, 2020.

Smys, S. A survey on internet of things (iot) based smart systems. Journal of ISMAC 2 (04): 181–189, 2020.

Stonebraker, M. and Ilyas, I. F. Data integration: The current status and the way forward. IEEE Data Eng. Bull. 41 (2): 3–9, 2018.

Stoyanova, M., Nikoloudakis, Y., Panagiotakis, S., Pallis, E., and Markakis, E. K. A survey on the internet of things (iot) forensics: challenges, approaches, and open issues. IEEE Communications Surveys & Tutorials 22 (2): 1191–1221, 2020.

Tan, W.-C. Deep data integration. In Proceedings of the 2021 International Conference on Management of Data. pp. 2–2, 2021.

Tian, A., Sequeda, J. F., and Miranker, D. P. Qodi: Query as context in automatic data integration. In International Semantic Web Conference. Springer, pp. 624–639, 2013.

Tibco. What is data streaming?, 2019. Acesso em 19 de agosto de 2021.

Toman, S. H. The design of a templating language to embed database queries into documents. Journal of Education College Wasit University 1 (29): 512–534, 2017.

Tu, D. Q., Kayes, A., Rahayu, W., and Nguyen, K. Iot streaming data integration from multiple sources. Computing 102 (10): 2299–2329, 2020.

Venkatesh, K., Ali, M. J. S., Nithiyanandam, N., and Rajesh, M. Challenges and research disputes and tools in big data analytics. International Journal of Engineering and Advanced Technology vol. 6, pp. 1949–1952, 2019.

Wang, X., Haas, L., and Meliou, A. Explaining data integration. Data Engineering Bulletin 41 (2), 2018.

Yousfi, S., Rhanoui, M., and Chiadmi, D. Towards a generic multimodal architecture for batch and streaming big data integration. arXiv preprint arXiv:2108.04343 , 2021.

Zanella, A., Bui, N., Castellani, A., Vangelista, L., and Zorzi, M. Internet of things for smart cities. IEEE Internet of Things journal 1 (1): 22–32, 2014.

Downloads

Published

2022-09-12

How to Cite

Amará, J., Ströele, V., Braga, R., Dantas, M., & Bauer, M. (2022). Integrating Heterogeneous Stream and Historical Data Sources using SQL. Journal of Information and Data Management, 13(2). https://doi.org/10.5753/jidm.2022.2488

Issue

Section

SBBD 2021 Full papers - Extended Papers