ARANI: Uma Abordagem Baseada em Linha de Experimento para Preservação de Privacidade em Data Lakes
Resumo
Os Data Lakes armazenam grandes volumes de dados heterogêneos, incluindo informações sensíveis. Garantir conformidade com regulamentações como a LGPD exige o uso de técnicas de anonimização. Técnicas aplicadas de forma isolada, como k-Anonimato ou Privacidade Diferencial, podem ser insuficientes. A combinação dessas técnicas em fluxos configuráveis é, portanto, essencial. Linhas de Experimento permitem estruturar e instanciar esses fluxos de forma flexível. Este artigo propõe a ARANI, uma abordagem baseada em Linha de Experimento que permite definir, executar e avaliar fluxos de anonimização com suporte a múltiplas técnicas.
Palavras-chave:
Security, Privacy, Anonimization, Data Lake, Experiment Line
Referências
Barros, P. V. d. S. et al. (2024). Incorporando os requisitos e as restrições da lgpd ao projeto de banco de dados. In SBBD’24, pages 341–353. SBC.
Bauer, D. et al. (2022). Revisiting data lakes: the metadata lake. In Middleware’22, page 8–14, New York, NY, USA.
Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. DOI: 10.24432/C5XW20.
Deshpande, A. (2021). Sypse: privacy-first data management through pseudonymization and partitioning. In CIDR, pages 1–8, Chaminade, CA.
Domingo-Ferrer, J. and Torra, V. (2005). Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11(2):195–212.
Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In TCC 2006, volume 3876, pages 265–284. Springer.
Francis, P., Probst-Eide, S., Obrok, P., Berneanu, C., Juric, S., and Munz, R. (2018). Diffix-birch: Extending diffix-aspen. arXiv preprint arXiv:1806.02075.
Giomi, M. et al. (2023). A unified framework for quantifying privacy risk in synthetic data. Proceedings on Privacy Enhancing Technologies, 2023(2):312–328.
Machado, J. C. and Amora, P. R. (2021). The impact of privacy regulations on db systems. Journal of Information and Data Management, 12(5).
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. (2007). L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1):3–es.
Miguel, J., Pereira, M. J., Henriques, P., and Berón, M. (2019). Assuring data privacy with privas – a tool for data publishers. IADIS International Journal on Computer Science and Information Systems, 14(2):41–58.
Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., and Arocena, P. C. (2019). Data lake management: Challenges and opportunities. Proc. VLDB Endow., 12(12):1986–1989.
Ogasawara, E. et al. (2009). Experiment line: software reuse in scientific workflows. In Proc. of the SSDBM 2009, pages 264–272, Berlin. Springer.
Oreščanin, D., Hlupić, T., and Vrdoljak, B. (2024). Managing personal identifiable information in data lakes. IEEE access, 12:32164–32180.
Poulis, G. et al. (2014). SECRETA: A system for evaluating and comparing relational and transaction anonymization algorithms. In EDBT’14, pages 620–623.
Prasser, F., Eicher, J., et al. (2020). Flexible data anonymization using arx—current status and challenges ahead. Software: Pract. and Exp., 50(7):1277–1304.
Sweeney, L. (2002). k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557–570.
Terrovitis, M., Liagouris, J., Mamoulis, N., and Skiadopoulos, S. (2012). Privacy preservation by disassociation. arXiv preprint arXiv:1207.0135.
Zigomitros, A., Casino, F., Solanas, A., and Patsakis, C. (2020). A survey on privacy properties for data publishing of relational data. Ieee Access, 8:51071–51099.
Bauer, D. et al. (2022). Revisiting data lakes: the metadata lake. In Middleware’22, page 8–14, New York, NY, USA.
Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. DOI: 10.24432/C5XW20.
Deshpande, A. (2021). Sypse: privacy-first data management through pseudonymization and partitioning. In CIDR, pages 1–8, Chaminade, CA.
Domingo-Ferrer, J. and Torra, V. (2005). Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11(2):195–212.
Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In TCC 2006, volume 3876, pages 265–284. Springer.
Francis, P., Probst-Eide, S., Obrok, P., Berneanu, C., Juric, S., and Munz, R. (2018). Diffix-birch: Extending diffix-aspen. arXiv preprint arXiv:1806.02075.
Giomi, M. et al. (2023). A unified framework for quantifying privacy risk in synthetic data. Proceedings on Privacy Enhancing Technologies, 2023(2):312–328.
Machado, J. C. and Amora, P. R. (2021). The impact of privacy regulations on db systems. Journal of Information and Data Management, 12(5).
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. (2007). L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1):3–es.
Miguel, J., Pereira, M. J., Henriques, P., and Berón, M. (2019). Assuring data privacy with privas – a tool for data publishers. IADIS International Journal on Computer Science and Information Systems, 14(2):41–58.
Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., and Arocena, P. C. (2019). Data lake management: Challenges and opportunities. Proc. VLDB Endow., 12(12):1986–1989.
Ogasawara, E. et al. (2009). Experiment line: software reuse in scientific workflows. In Proc. of the SSDBM 2009, pages 264–272, Berlin. Springer.
Oreščanin, D., Hlupić, T., and Vrdoljak, B. (2024). Managing personal identifiable information in data lakes. IEEE access, 12:32164–32180.
Poulis, G. et al. (2014). SECRETA: A system for evaluating and comparing relational and transaction anonymization algorithms. In EDBT’14, pages 620–623.
Prasser, F., Eicher, J., et al. (2020). Flexible data anonymization using arx—current status and challenges ahead. Software: Pract. and Exp., 50(7):1277–1304.
Sweeney, L. (2002). k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557–570.
Terrovitis, M., Liagouris, J., Mamoulis, N., and Skiadopoulos, S. (2012). Privacy preservation by disassociation. arXiv preprint arXiv:1207.0135.
Zigomitros, A., Casino, F., Solanas, A., and Patsakis, C. (2020). A survey on privacy properties for data publishing of relational data. Ieee Access, 8:51071–51099.
Publicado
29/09/2025
Como Citar
JORDÃO, Thiago; BEDO, Marcos; DE OLIVEIRA, Daniel.
ARANI: Uma Abordagem Baseada em Linha de Experimento para Preservação de Privacidade em Data Lakes. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 40. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 844-850.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2025.247756.
