Logical Data Lakes as Platforms for Government Data in Smart Societies and Cities
Abstract
Data lakes have received attention from corporate, academic, and government entities. This new approach to storing data has shown versatility for developing secure platforms, guaranteeing data privacy, quality, and governance. To take advantage of these characteristics and contribute to the value generation for government policies, we present a data lake architecture used to integrate government applications and data from Alagoas state, Brazil. As a preliminary result, we demonstrate a search result for the geographic distribution of the users in the systems integrated based on the proposed architecture.
Keywords:
Logical Data Lakes, Government Data, Smart cities
References
Al-Ahmad, A. S. and Kahtan, H. (2018). Cloud Computing Review: Features and Issues. In International Conference on Smart Computing and Electronic Enterprise (ICSCEE’18).
Attard, J., Orlandi, F., Scerri, S., and Auer, S. (2015). A Systematic Review of Open Government Data Initiatives. Government Information Quarterly, 32(4):399–418.
Bozic, K. and Dimovski, V. (2019). Business Intelligence and Analytics for Value Creation: The Role of Absorptive Capacity. International Journal of Information Management, 46:93–103.
Fang, H. (2015). Managing data lakes in big data era: What’s a data lake and why has it became popular in data management ecosystem. In 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pages 820–824. IEEE.
Gorelik, A. (2019). The Enterprise Big Data Lake: Delivering the Promise of Big Data and Data Science. O’Reilly Media.
He, W. and Da Xu, L. (2012). Integration of Distributed Enterprise Applications: A Survey. IEEE Transactions on Industrial Informatics, 10(1):35–42.
Heaton, J. (2016). An Empirical Analysis of Feature Engineering for Predictive Modeling. In IEEE Region 3 South East Conference (SoutheastCon’16).
Huai, Y., Chauhan, A., Gates, A., Hagleitner, G., Hanson, E. N., O’Malley, O., Pandey, J., Yuan, Y., Lee, R., and Zhang, X. (2014). Major Technical Advancements in Apache Hive. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pages 1235–1246.
IRENA Group (2019). Innovation Landscape Brief: Internet of Things. Book ISBN 97892-9260-142-3, International Renewable Energy Agency, Abu Dhabi, United Arab Emirates.
Jetzek, T., Avital, M., and Bjorn-Andersen, N. (2014). Data-driven Innovation Through Open Government Data. Journal of Theoretical and Applied Electronic Commerce Research, 9(2):100–120.
Li, Y., Zhang, A., Zhang, X., and Wu, Z. (2018). A Data Lake Architecture for Monitoring and Diagnosis System of Power Grid. In Artificial Intelligence and Cloud Computing Conference (AICC’18).
Liaqat, M., Chang, V., Gani, A., Ab Hamid, S. H., Toseef, M., Shoaib, U., and Ali, R. L. (2017). Federated Cloud Resource Management: Review and Discussion. Journal of Network and Computer Applications, 77:87–105.
Mami, M. N., Graux, D., Scerri, S., Jabeen, H., Auer, S., and Lehmann, J. (2019). Uniform Access to Multiform Data Lakes Using Semantic Technologies. In 21st International Conference on Information Integration and Web-based Applications & Services (IIWAS’19).
Marx, V. (2013). The Big Challenges of Big Data. Nature, 498(7453):255–260.
Mehmood, H., Gilman, E., Cortes, M., Kostakos, P., Byrne, A., Valta, K., Tekes, S., and Riekki, J. (2019). Implementing Big Data Lake for Heterogeneous Data Sources. In IEEE 35th International Conference on Data Engineering Workshops (ICDEW’19).
Pereira, G. V., Macadar, M. A., Luciano, E. M., and Testa, M. G. (2017). Delivering Public Value Through Open Government Data Initiatives in a Smart City Context. Information Systems Frontiers, 19(2):213–229.
Provost, F. and Fawcett, T. (2013). Data Science and its Relationship to Big Data and Data-driven Decision Making. Big data, 1(1):51–59.
Sawadogo, P. and Darmont, J. (2021). On Data Lake Architectures and Metadata Management. Journal of Intelligent Information Systems, 56(1):97–120.
Sethi, R., Traverso, M., Sundstrom, D., Phillips, D., Xie, W., Sun, Y., Yegitbasi, N., Jin, H., Hwang, E., Shingte, N., et al. (2019). Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1802–1813. IEEE.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The Hadoop Distributed File System. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10).
Stefanovic, D., Marjanovic, U., Delíc, M., Culibrk, D., and Lalic, B. (2016). Assessing Information & the Success of E-government Systems: An Employee Perspective. Management, 53(6):717–726.
Stefanowski, J., Krawiec, K., and Wrembel, R. (2017). Exploring Complex and Big Data. International Journal of Applied Mathematics and Computer Science, 27(4):669–679.
Wadkar, S. and Siddalingaiah, M. (2014). Apache Ambari. In Pro Apache Hadoop, pages 399–401. Springer.
Welch, E. W., Hinnant, C. C., and Moon, M. J. (2005). Linking Citizen Satisfaction With E-Government and Trust in Government. Journal of Public Administration Research and Theory, 15(3):371–391.
Zagan, E. and Danubianu, M. (2020). Data Lake Approaches: A Survey. In International Conference on Development and Application Systems (DAS’20).
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., et al. (2016). Apache Spark: A Unified Engine For Big Data Processing. Communications of the ACM, 59(11):56–65.
Attard, J., Orlandi, F., Scerri, S., and Auer, S. (2015). A Systematic Review of Open Government Data Initiatives. Government Information Quarterly, 32(4):399–418.
Bozic, K. and Dimovski, V. (2019). Business Intelligence and Analytics for Value Creation: The Role of Absorptive Capacity. International Journal of Information Management, 46:93–103.
Fang, H. (2015). Managing data lakes in big data era: What’s a data lake and why has it became popular in data management ecosystem. In 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pages 820–824. IEEE.
Gorelik, A. (2019). The Enterprise Big Data Lake: Delivering the Promise of Big Data and Data Science. O’Reilly Media.
He, W. and Da Xu, L. (2012). Integration of Distributed Enterprise Applications: A Survey. IEEE Transactions on Industrial Informatics, 10(1):35–42.
Heaton, J. (2016). An Empirical Analysis of Feature Engineering for Predictive Modeling. In IEEE Region 3 South East Conference (SoutheastCon’16).
Huai, Y., Chauhan, A., Gates, A., Hagleitner, G., Hanson, E. N., O’Malley, O., Pandey, J., Yuan, Y., Lee, R., and Zhang, X. (2014). Major Technical Advancements in Apache Hive. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pages 1235–1246.
IRENA Group (2019). Innovation Landscape Brief: Internet of Things. Book ISBN 97892-9260-142-3, International Renewable Energy Agency, Abu Dhabi, United Arab Emirates.
Jetzek, T., Avital, M., and Bjorn-Andersen, N. (2014). Data-driven Innovation Through Open Government Data. Journal of Theoretical and Applied Electronic Commerce Research, 9(2):100–120.
Li, Y., Zhang, A., Zhang, X., and Wu, Z. (2018). A Data Lake Architecture for Monitoring and Diagnosis System of Power Grid. In Artificial Intelligence and Cloud Computing Conference (AICC’18).
Liaqat, M., Chang, V., Gani, A., Ab Hamid, S. H., Toseef, M., Shoaib, U., and Ali, R. L. (2017). Federated Cloud Resource Management: Review and Discussion. Journal of Network and Computer Applications, 77:87–105.
Mami, M. N., Graux, D., Scerri, S., Jabeen, H., Auer, S., and Lehmann, J. (2019). Uniform Access to Multiform Data Lakes Using Semantic Technologies. In 21st International Conference on Information Integration and Web-based Applications & Services (IIWAS’19).
Marx, V. (2013). The Big Challenges of Big Data. Nature, 498(7453):255–260.
Mehmood, H., Gilman, E., Cortes, M., Kostakos, P., Byrne, A., Valta, K., Tekes, S., and Riekki, J. (2019). Implementing Big Data Lake for Heterogeneous Data Sources. In IEEE 35th International Conference on Data Engineering Workshops (ICDEW’19).
Pereira, G. V., Macadar, M. A., Luciano, E. M., and Testa, M. G. (2017). Delivering Public Value Through Open Government Data Initiatives in a Smart City Context. Information Systems Frontiers, 19(2):213–229.
Provost, F. and Fawcett, T. (2013). Data Science and its Relationship to Big Data and Data-driven Decision Making. Big data, 1(1):51–59.
Sawadogo, P. and Darmont, J. (2021). On Data Lake Architectures and Metadata Management. Journal of Intelligent Information Systems, 56(1):97–120.
Sethi, R., Traverso, M., Sundstrom, D., Phillips, D., Xie, W., Sun, Y., Yegitbasi, N., Jin, H., Hwang, E., Shingte, N., et al. (2019). Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1802–1813. IEEE.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The Hadoop Distributed File System. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10).
Stefanovic, D., Marjanovic, U., Delíc, M., Culibrk, D., and Lalic, B. (2016). Assessing Information & the Success of E-government Systems: An Employee Perspective. Management, 53(6):717–726.
Stefanowski, J., Krawiec, K., and Wrembel, R. (2017). Exploring Complex and Big Data. International Journal of Applied Mathematics and Computer Science, 27(4):669–679.
Wadkar, S. and Siddalingaiah, M. (2014). Apache Ambari. In Pro Apache Hadoop, pages 399–401. Springer.
Welch, E. W., Hinnant, C. C., and Moon, M. J. (2005). Linking Citizen Satisfaction With E-Government and Trust in Government. Journal of Public Administration Research and Theory, 15(3):371–391.
Zagan, E. and Danubianu, M. (2020). Data Lake Approaches: A Survey. In International Conference on Development and Application Systems (DAS’20).
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., et al. (2016). Apache Spark: A Unified Engine For Big Data Processing. Communications of the ACM, 59(11):56–65.
Published
2022-07-31
How to Cite
RAMOS, Geymerson S.; FERNANDES, Danilo; COELHO, Jorge Artur P. de M.; AQUINO, Andre L. L..
Logical Data Lakes as Platforms for Government Data in Smart Societies and Cities. In: LATIN AMERICAN SYMPOSIUM ON DIGITAL GOVERNMENT (LASDIGOV), 10. , 2022, Niterói.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 215-226.
ISSN 2763-8723.
DOI: https://doi.org/10.5753/wcge.2022.223047.
