Metodologias e ferramentas de governança de dados aplicadas ao gerenciamento de data lakes: uma revisão sistemática
Resumo
Um data lake é um repositório centralizado de grande porte, utilizado para armazenar dados de qualquer tipo, sem restrições quanto ao formato ou à estrutura. No entanto, essa flexibilidade pode resultar na formação de um pântano de dados que é uma situação em que o repositório passa a concentrar informações desorganizadas, inconsistentes ou de baixo valor. Para evitar esse cenário, torna-se essencial a adoção de práticas eficazes de governança de dados, que garantam o armazenamento adequado e a gestão eficiente das informações. Considerando as lacunas existentes na literatura sobre a implementação da governança em data lakes, este estudo propõe uma revisão sistemática da literatura com o objetivo de identificar metodologias e ferramentas utilizadas na gestão desses repositórios.Referências
Bližnák, K., Munk, M., and Pilková, A. (2024). A systematic review of recent literature on data governance (2017–2023). IEEE Access, 12:149875–149888.
Cherradi, M., Bouhafer, F., and Haddadi, A. E. (2023). Data lake governance using ibm-watson knowledge catalog. Scientific African, 21.
Cherradi, M. and El Haddadi, A. (2024). Enhancing data lake management systems with lda approach. Journal of Data Science and Intelligent Systems, 3(1):58–66.
DAMA International (2017). DAMA-DMBOK: Data Management Body of Knowledge. Technics Publications, USA, 2 edition.
Derakhshannia, M., Gervet, C., Hajj-Hassan, H., Laurent, A., and Martin, A. (2020). Data lake governance: Towards a systemic and natural ecosystem analogy. Future Internet, 12:1–16.
Derakhshannia, M., Laurent, A., and Martin, A. (2023). Mixing biology and computer science concepts to design resilient data lakes. Journal of Interdisciplinary Methodologies and Issues in Science, 11.
Galvão, M. C. B. and Ricarte, I. L. M. (2020). Revisão sistemática da literatura: conceituação, produção e publicação. LOGEION: Filosofia da informação, 6:57–63.
Garriga, M., Aarns, K., Tsigkanos, C., Tamburri, D. A., and Heuvel, W. V. D. (2021). Dataops for cyber-physical systems governance: The airport passenger flow case. ACM Transactions on Internet Technology, 21(2):Article 36, 25 pages.
Giebler, C., Gröger, C., Hoos, E., Schwarz, H., and Mitschang, B. (2020). A zone reference model for enterprise-grade data lake management. In 2020 IEEE 24th International Enterprise Distributed Object Computing Conference (EDOC), pages 57–66, Eindhoven, Netherlands.
Gyulgyulyan, E. and Astsatryan, H. (2023). Alert system for data quality in data lakes. In CSIT Conference 2023, Yerevan, Armenia.
Hamadou, H. B., Bach Pedersen, T., and Thomsen, C. (2020). The danish national energy data lake: Requirements, technical architecture, and tool selection. In 2020 IEEE International Conference on Big Data (Big Data), pages 1523–1532, Atlanta, GA, USA.
Ishwarappa and Anuradha, J. (2015). A brief introduction on big data 5vs characteristics and hadoop technology. Procedia Computer Science, 48:319–324.
Nambiar, A. and Mundra, D. (2022). An overview of data warehouse and data lake in modern enterprise data management. Big Data and Cognitive Computing, 6(4):132.
O’Brien, M. A., Mohally, D., Brasche, G. P., and Sanfilippo, A. G. (2022). Huawei and international data spaces. In Otto, B., ten Hompel, M., and Wrobel, S., editors, Designing Data Spaces. Springer, Cham.
Plebani, P., Kat, R., Pallas, F., Werner, S., Inches, G., Laud, P., and Santiago, R. (2023). Teadal: Trustworthy, energy-aware federated data lakes along the computing continuum. In CEUR Workshop Proceedings, volume 3413, pages 28–35.
Sarramia, D., Claude, A., Ogereau, F., Mezhoud, J., and Mailhot, G. (2022). Ceba: A data lake for data sharing and environmental monitoring. Sensors, 22:2733.
Sosa, D. and Paciello, J. (2021). Data lake: A case of study of a big data analytics architecture for public procurements. In 2021 Eighth International Conference on eDemocracy & eGovernment (ICEDEG), pages 194–198, Quito, Ecuador.
Wang, H., Adenutsi, C. D., Wang, C., Sun, Z., Zhang, Y., Li, Y., Zhang, Z., and Wang, J. (2023). Construction and application of a big data system for regional lakes in coalbed methane development. ACS Omega, 8(20):18323–18331.
Cherradi, M., Bouhafer, F., and Haddadi, A. E. (2023). Data lake governance using ibm-watson knowledge catalog. Scientific African, 21.
Cherradi, M. and El Haddadi, A. (2024). Enhancing data lake management systems with lda approach. Journal of Data Science and Intelligent Systems, 3(1):58–66.
DAMA International (2017). DAMA-DMBOK: Data Management Body of Knowledge. Technics Publications, USA, 2 edition.
Derakhshannia, M., Gervet, C., Hajj-Hassan, H., Laurent, A., and Martin, A. (2020). Data lake governance: Towards a systemic and natural ecosystem analogy. Future Internet, 12:1–16.
Derakhshannia, M., Laurent, A., and Martin, A. (2023). Mixing biology and computer science concepts to design resilient data lakes. Journal of Interdisciplinary Methodologies and Issues in Science, 11.
Galvão, M. C. B. and Ricarte, I. L. M. (2020). Revisão sistemática da literatura: conceituação, produção e publicação. LOGEION: Filosofia da informação, 6:57–63.
Garriga, M., Aarns, K., Tsigkanos, C., Tamburri, D. A., and Heuvel, W. V. D. (2021). Dataops for cyber-physical systems governance: The airport passenger flow case. ACM Transactions on Internet Technology, 21(2):Article 36, 25 pages.
Giebler, C., Gröger, C., Hoos, E., Schwarz, H., and Mitschang, B. (2020). A zone reference model for enterprise-grade data lake management. In 2020 IEEE 24th International Enterprise Distributed Object Computing Conference (EDOC), pages 57–66, Eindhoven, Netherlands.
Gyulgyulyan, E. and Astsatryan, H. (2023). Alert system for data quality in data lakes. In CSIT Conference 2023, Yerevan, Armenia.
Hamadou, H. B., Bach Pedersen, T., and Thomsen, C. (2020). The danish national energy data lake: Requirements, technical architecture, and tool selection. In 2020 IEEE International Conference on Big Data (Big Data), pages 1523–1532, Atlanta, GA, USA.
Ishwarappa and Anuradha, J. (2015). A brief introduction on big data 5vs characteristics and hadoop technology. Procedia Computer Science, 48:319–324.
Nambiar, A. and Mundra, D. (2022). An overview of data warehouse and data lake in modern enterprise data management. Big Data and Cognitive Computing, 6(4):132.
O’Brien, M. A., Mohally, D., Brasche, G. P., and Sanfilippo, A. G. (2022). Huawei and international data spaces. In Otto, B., ten Hompel, M., and Wrobel, S., editors, Designing Data Spaces. Springer, Cham.
Plebani, P., Kat, R., Pallas, F., Werner, S., Inches, G., Laud, P., and Santiago, R. (2023). Teadal: Trustworthy, energy-aware federated data lakes along the computing continuum. In CEUR Workshop Proceedings, volume 3413, pages 28–35.
Sarramia, D., Claude, A., Ogereau, F., Mezhoud, J., and Mailhot, G. (2022). Ceba: A data lake for data sharing and environmental monitoring. Sensors, 22:2733.
Sosa, D. and Paciello, J. (2021). Data lake: A case of study of a big data analytics architecture for public procurements. In 2021 Eighth International Conference on eDemocracy & eGovernment (ICEDEG), pages 194–198, Quito, Ecuador.
Wang, H., Adenutsi, C. D., Wang, C., Sun, Z., Zhang, Y., Li, Y., Zhang, Z., and Wang, J. (2023). Construction and application of a big data system for regional lakes in coalbed methane development. ACS Omega, 8(20):18323–18331.
Publicado
12/08/2025
Como Citar
SANTOS, Wyllyany C.; LIMA, David H. S.; SILVA, Carlos A. F.; FERRO, Márcio R. C..
Metodologias e ferramentas de governança de dados aplicadas ao gerenciamento de data lakes: uma revisão sistemática. In: ESCOLA REGIONAL DE COMPUTAÇÃO BAHIA, ALAGOAS E SERGIPE (ERBASE), 25. , 2025, Lagarto/SE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 336-344.
DOI: https://doi.org/10.5753/erbase.2025.13809.
