Data Stream Anonymization in DOCA
Abstract
Online scenarios are increasingly more common, providing great opportunities for data analysis. Such data usually contains sensitive information and should be anonymized to guarantee individuals’ privacy. This work proposes DOCA, a differentially private approach for publishing data streaming in non-interactive scenarios using an online microaggregation strategy to obtain better utility.
Keywords:
Data streaming, sensitive information, micro-aggregation
References
Bindschaedler, V., Shokri, R., and Gunter, C. A. (2017). Plausible deniability for privacy-preserving data synthesis. Proc. VLDB Endow., 10(5):481–492.
Cao, J., Carminati, B., Ferrari, E., and Tan, K. L. (2011). Castle: Continuously anonymizing data streams. IEEE Transactions on Dependable and Secure Computing, 8(3):337–352.
Chen, R., Mohammed, N., Fung, B. C. M., Desai, B. C., and Xiong, L. (2011). Publishing set-valued data via differential privacy. PVLDB, 4(11):1087–1098.
Dwork, C. and Roth, A. (2014). The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3–4):211–407.
Backblaze (2017). The raw hard drive test data from 2017-01-01 to 2017-01-31. Online at https://www.backblaze.com/b2/hard-drive-test-data.html. acessed 2018-04-22.
Silva, J. A., Faria, E. R., Barros, R. C., Hruschka, E. R., Carvalho, A. C. P. L. F. d., and Gama, J. a. (2013). Data stream clustering: A survey. ACM Comput. Surv., 46(1):13:1–13:31.
Soria-Comas, J. and Domingo-Ferrer, J. (2017). Differentially private data sets based on microaggregation and record perturbation. In MDAI 2017, Kitakyushu, Japan, October, 2017, Proceedings, pages 119–131.
Soria-Comas, J., Domingo-Ferrer, J., Sanchez, D., and Martínez, S. (2014). Enhancing data utility in differential privacy via microaggregation-based k-anonymity. The VLDB Journal, 23(5):771–794.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557–570.
Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., and Winslett, M. (2013). Differentially private histogram publication. The VLDB Journal, 22(6):797–822.
Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D., and Xiao, X. (2017). Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst., 42(4):25:1–25:41
Cao, J., Carminati, B., Ferrari, E., and Tan, K. L. (2011). Castle: Continuously anonymizing data streams. IEEE Transactions on Dependable and Secure Computing, 8(3):337–352.
Chen, R., Mohammed, N., Fung, B. C. M., Desai, B. C., and Xiong, L. (2011). Publishing set-valued data via differential privacy. PVLDB, 4(11):1087–1098.
Dwork, C. and Roth, A. (2014). The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3–4):211–407.
Backblaze (2017). The raw hard drive test data from 2017-01-01 to 2017-01-31. Online at https://www.backblaze.com/b2/hard-drive-test-data.html. acessed 2018-04-22.
Silva, J. A., Faria, E. R., Barros, R. C., Hruschka, E. R., Carvalho, A. C. P. L. F. d., and Gama, J. a. (2013). Data stream clustering: A survey. ACM Comput. Surv., 46(1):13:1–13:31.
Soria-Comas, J. and Domingo-Ferrer, J. (2017). Differentially private data sets based on microaggregation and record perturbation. In MDAI 2017, Kitakyushu, Japan, October, 2017, Proceedings, pages 119–131.
Soria-Comas, J., Domingo-Ferrer, J., Sanchez, D., and Martínez, S. (2014). Enhancing data utility in differential privacy via microaggregation-based k-anonymity. The VLDB Journal, 23(5):771–794.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557–570.
Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., and Winslett, M. (2013). Differentially private histogram publication. The VLDB Journal, 22(6):797–822.
Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D., and Xiao, X. (2017). Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst., 42(4):25:1–25:41
Published
2018-08-25
How to Cite
LEAL, Bruno C.; VIDAL, Israel C.; MACHADO, Javam C..
Data Stream Anonymization in DOCA. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 33. , 2018, Rio de Janeiro.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2018
.
p. 295-300.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2018.22246.
