Differential Privacy in Polystore Systems: A Practical Approach
Abstract
Several techniques guarantee data privacy, especially in Database Management Systems (DBMSs). However, nowadays many organizations store data in its raw format in data lakes. As the data can be found in multiple formats, Polystore systems are used to query data in an integrated way. However, Polystore systems do not consider privacy issues, delegating this responsibility to the underlying DBMSs. In this paper, we propose an approach called DIMPLY to couple privacy mechanisms into Polystore systems. DIMPLY users submit queries in the Polystore system syntax and receive anonymized results. As a privacy technique, we chose differential privacy. To evaluate DIMPLY, we used a dataset of exams of suspected cases of Zika in Brazil.
Keywords:
Differential Privacy, Polystore Systems, GDPR
References
Backstrom, L., Dwork, C., and Kleinberg, J. (2007). Wherefore art thou r3579x? anonymized social networks, hidden patterns, and structural steganography. In WWW’07, pages 181-190.
de Lourdes Maia Silva, M., Chaves, I. C., and Machado, J. C. (2021). Private reverse top-k algorithms applied on public data of COVID-19 in the state of ceará. J. Inf. Data Manag., 12(5).
de Oliveira, D., Neto, E. R. D., et al. (2019). Um estudo comparativo de mecanismos de privacidade diferencial sobre um dataset de ocorrências do ZIKV no Brasil. In Proc. of the 34th SBBD, pages 253-258. SBC.
Duggan, J., Elmore, A. J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., and Zdonik, S. (2015). The bigdawg polystore system. ACM Sigmod Record, 44(2):11-16.
Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265-284. Springer.
Dwork, C., Roth, A., et al. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211-407.
Erlingsson, Ú., Pihur, V., and Korolova, A. (2014). Rappor: Randomized aggregatable privacy-preserving ordinal response. In SIGSAC’14, pages 1054-1067.
Ge, C., He, X., Ilyas, I. F., and Machanavajjhala, A. (2019). Apex: Accuracy-aware differentially private data exploration. In SIGMOD ’19, pages 177-194.
Johnson, N., Near, J. P., and Song, D. (2018). Towards practical differential privacy for sql queries. Proceedings of the VLDB Endowment, 11(5):526-539.
Kraska, T., Stonebraker, M., Brodie, M. L., Servan-Schreiber, S., and Weitzner, D. J. (2019). Schengendb: A data protection database proposal. In Poly’19, volume 11721, pages 24- 38. Springer.
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1):3-es.
McSherry, F. D. (2009). Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD’09, pages 19-30.
Mendes, Y., de Oliveira, D., and Ströele, V. (2020). Polyflow: a polystore-compliant mechanism to provide interoperability to heterogeneous provenance graphs. J. Inf. Data Manag., 11(3).
Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., and Arocena, P. C. (2019). Data lake management: Challenges and opportunities. Proc. VLDB Endow., 12(12):1986-1989.
Proserpio, D., Goldberg, S., and McSherry, F. (2014). Calibrating data to sensitivity in private data analysis: A platform for differentially-private analysis of weighted datasets. PVLDB, 7(8):637-648.
Ramos, L. F. M. and Silva, J. a. M. C. (2019). Privacy and data protection concerns regarding the use of blockchains in smart cities. In ICEGOV’2019, page 342-347, Melbourne, Australia. ACM.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557-570.
Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63-69.
de Lourdes Maia Silva, M., Chaves, I. C., and Machado, J. C. (2021). Private reverse top-k algorithms applied on public data of COVID-19 in the state of ceará. J. Inf. Data Manag., 12(5).
de Oliveira, D., Neto, E. R. D., et al. (2019). Um estudo comparativo de mecanismos de privacidade diferencial sobre um dataset de ocorrências do ZIKV no Brasil. In Proc. of the 34th SBBD, pages 253-258. SBC.
Duggan, J., Elmore, A. J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., and Zdonik, S. (2015). The bigdawg polystore system. ACM Sigmod Record, 44(2):11-16.
Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265-284. Springer.
Dwork, C., Roth, A., et al. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211-407.
Erlingsson, Ú., Pihur, V., and Korolova, A. (2014). Rappor: Randomized aggregatable privacy-preserving ordinal response. In SIGSAC’14, pages 1054-1067.
Ge, C., He, X., Ilyas, I. F., and Machanavajjhala, A. (2019). Apex: Accuracy-aware differentially private data exploration. In SIGMOD ’19, pages 177-194.
Johnson, N., Near, J. P., and Song, D. (2018). Towards practical differential privacy for sql queries. Proceedings of the VLDB Endowment, 11(5):526-539.
Kraska, T., Stonebraker, M., Brodie, M. L., Servan-Schreiber, S., and Weitzner, D. J. (2019). Schengendb: A data protection database proposal. In Poly’19, volume 11721, pages 24- 38. Springer.
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1):3-es.
McSherry, F. D. (2009). Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD’09, pages 19-30.
Mendes, Y., de Oliveira, D., and Ströele, V. (2020). Polyflow: a polystore-compliant mechanism to provide interoperability to heterogeneous provenance graphs. J. Inf. Data Manag., 11(3).
Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., and Arocena, P. C. (2019). Data lake management: Challenges and opportunities. Proc. VLDB Endow., 12(12):1986-1989.
Proserpio, D., Goldberg, S., and McSherry, F. (2014). Calibrating data to sensitivity in private data analysis: A platform for differentially-private analysis of weighted datasets. PVLDB, 7(8):637-648.
Ramos, L. F. M. and Silva, J. a. M. C. (2019). Privacy and data protection concerns regarding the use of blockchains in smart cities. In ICEGOV’2019, page 342-347, Melbourne, Australia. ACM.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557-570.
Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63-69.
Published
2022-09-19
How to Cite
BERTELLI, Lucas; STRÖELE, Victor; MACHADO, Javam; DE OLIVEIRA, Daniel.
Differential Privacy in Polystore Systems: A Practical Approach. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 37. , 2022, Búzios.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 279-291.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2022.224305.
