Data Lakehouses for Large-Scale Geospatial Data Analysis
Abstract
Data Warehouses and Data Lakes are architectures capable of handling complex analyses, however, the increase in geospatial data generation, driven by the Internet of Things, highlights the limitations of both architectures. Data Lakehouses emerge as the new state-of-the-art for Big Data storage, offering an integrated and cost-effective solution. This paper proposes the use of Data Lakehouses for a Big Geospatial Data storage and analysis environment. In addition, a case study with geolocation data of municipal buses was conducted to demonstrate the feasibility of the proposed environment.
Keywords:
Data Lakehouses, Geospatial Data, Data Analysis
References
Armbrust, M., Ghodsi, A., Xin, R., and Zaharia, M. (2021). Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In Proceedings of CIDR, volume 8, page 28.
de Carvalho Castro, J. P., Chaves Carniel, A., and Dutra de Aguiar Ciferri, C. (2020). Analyzing spatial analytics systems based on Hadoop and Spark: A user perspective. Software: Practice and Experience, 50(12):2121–2144.
Errami, S. A., Hajji, H., El Kadi, K. A., and Badir, H. (2023). Spatial big data architecture: from data warehouses and data lakes to the Lakehouse. Journal of Parallel and Distributed Computing, 176:70–79.
Errami, S. A., Hajji, H., Kadi, K. A. E., and Badir, H. (2022). Managing Spatial Big Data on the Data LakeHouse. In International Conference on Networking, Intelligent Systems and Security, pages 323–331. Springer.
Hassan, I. (2024). Storage structures in the era of big data: From data warehouse to lakehouse. Journal of Theoretical and Applied Information Technology, 102(6).
Jain, P., Kraft, P., Power, C., Das, T., Stoica, I., and Zaharia, M. (2023). Analyzing and Comparing Lakehouse Storage Systems. In 13th Conference on Innovative Data Systems Research, CIDR.
Medina, A., Mosquera, D., and Gallegos, F. A. (2023). A Methodological Approach for Data Collection and Geospatial Information of Healthy Public Spaces in Peripheral Neighborhoods—Case Studies: La Bota and Toctiuco, Quito, Ecuador. Sustainability, 15(21):15553.
Melo, R. T., Vasconcelos, F. F., Silva, R. L. L., Santos, P. V., Ramos, V. T., and Coutinho, F. J. (2023). BRBus-construindo um dataset para monitoramento geoespacial dos ônibus de cidades brasileiras. In Anais do V DSW. SBC.
Mete, M. (2023). Geospatial Big Data Analytics for Sustainable Smart Cities. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 48:141–146.
Queiroz, A. R. M., Santos, V. B., Nascimento, D. C., and Pires, C. E. S. (2019). Conformity analysis of GTFS routes and bus trajectories. In Anais do XXXIV Simpósio Brasileiro de Banco de Dados, pages 199–204. SBC.
Reinsel, D., Gantz, J., and Rydning, J. (2018). The Digitization of the World, from Edge to Core. Relatório Técnico. An IDC White Paper-US44413318, Sponsored by Seagate.
Vasconcelos, F. F., Ramos, V. T., and Coutinho, F. J. (2023). Os desafios e soluções para a implementação de Big Data Analytics em cidades inteligentes. In Anais Estendidos do XXXVIII SBBD. SBC.
de Carvalho Castro, J. P., Chaves Carniel, A., and Dutra de Aguiar Ciferri, C. (2020). Analyzing spatial analytics systems based on Hadoop and Spark: A user perspective. Software: Practice and Experience, 50(12):2121–2144.
Errami, S. A., Hajji, H., El Kadi, K. A., and Badir, H. (2023). Spatial big data architecture: from data warehouses and data lakes to the Lakehouse. Journal of Parallel and Distributed Computing, 176:70–79.
Errami, S. A., Hajji, H., Kadi, K. A. E., and Badir, H. (2022). Managing Spatial Big Data on the Data LakeHouse. In International Conference on Networking, Intelligent Systems and Security, pages 323–331. Springer.
Hassan, I. (2024). Storage structures in the era of big data: From data warehouse to lakehouse. Journal of Theoretical and Applied Information Technology, 102(6).
Jain, P., Kraft, P., Power, C., Das, T., Stoica, I., and Zaharia, M. (2023). Analyzing and Comparing Lakehouse Storage Systems. In 13th Conference on Innovative Data Systems Research, CIDR.
Medina, A., Mosquera, D., and Gallegos, F. A. (2023). A Methodological Approach for Data Collection and Geospatial Information of Healthy Public Spaces in Peripheral Neighborhoods—Case Studies: La Bota and Toctiuco, Quito, Ecuador. Sustainability, 15(21):15553.
Melo, R. T., Vasconcelos, F. F., Silva, R. L. L., Santos, P. V., Ramos, V. T., and Coutinho, F. J. (2023). BRBus-construindo um dataset para monitoramento geoespacial dos ônibus de cidades brasileiras. In Anais do V DSW. SBC.
Mete, M. (2023). Geospatial Big Data Analytics for Sustainable Smart Cities. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 48:141–146.
Queiroz, A. R. M., Santos, V. B., Nascimento, D. C., and Pires, C. E. S. (2019). Conformity analysis of GTFS routes and bus trajectories. In Anais do XXXIV Simpósio Brasileiro de Banco de Dados, pages 199–204. SBC.
Reinsel, D., Gantz, J., and Rydning, J. (2018). The Digitization of the World, from Edge to Core. Relatório Técnico. An IDC White Paper-US44413318, Sponsored by Seagate.
Vasconcelos, F. F., Ramos, V. T., and Coutinho, F. J. (2023). Os desafios e soluções para a implementação de Big Data Analytics em cidades inteligentes. In Anais Estendidos do XXXVIII SBBD. SBC.
Published
2024-10-14
How to Cite
F. VASCONCELOS, Felipe; J. COUTINHO, Fábio.
Data Lakehouses for Large-Scale Geospatial Data Analysis. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 39. , 2024, Florianópolis/SC.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 722-728.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2024.243648.
