Privacy-preserving of patients with Differential Privacy: an experimental evaluation in COVID-19 dataset

Authors

  • Manuel E. B. Filho Universidade Federal do Ceará
  • Eduardo R. Duarte Neto Universidade Federal do Ceará
  • Javam C. Machado Universidade Federal do Ceará

DOI:

https://doi.org/10.5753/jidm.2021.1947

Keywords:

COVID-19, differentially private publication, data analysis

Abstract

The pandemic of the new coronavirus (COVID-19) has brought new challenges to health systems in almost every corner of the world, many of them overburdened. The data analysis has given support in the fight against the coronavirus. Through this analysis, government authorities, together with health care providers, adopted effective strategies. Yet, those strategies can not be careless of privacy concerns. The individuals’ privacy is a right of each citizen. Privacy techniques guarantee the analysis of health data without exposing individuals’ private information. However, a balance between data privacy and utility is essential for a good analysis of the data. This work will demonstrate that it is possible to guarantee the privacy of infected patients and maintain the utility of the data, allowing a sound analysis on them, from the visualization of the application of differentially private mechanisms on queries in the data of patients tested in the State of Ceará - Brazil.

Downloads

Download data is not yet available.

References

Aktay, A., Bavadekar, S., Cossoul, G., Davis, J., Desfontaines, D., Fabrikant, A., Gabrilovich, E., Gadepalli, K., Gipson, B., Guevara, M., Kamath, C., Kansal, M., Lange, A., Mandayam, C., Oplinger, A., Pluntke, C., Roessler, T., Schlosberg, A., Shekel, T., Vispute, S., Vu, M., Wellenius, G., Williams, B., and Wilson, R. J. Google COVID-19 community mobility reports: Anonymization process description (version 1.0), 2020.

Bento Filho, M. E., Neto, E. R. D., and Machado, J. Publicação diferencialmente privada de dados de pacientes de covid-19. In Anais do XXXV Simpósio Brasileiro de Bancos de Dados. SBC, SBC Open Lib, Fortaleza, pp. 247–252, 2020.

Brito, F. and Machado, J. Preservação de privacidade de dados: Fundamentos, técnicas e aplicações. In 36o JAI – Jornadas de Atualização em Informática, F. Delicado, P. Pires, and I. Silveira (Eds.). SBC, 2, pp. 91–130, 2017.

Chaudhuri, A. and Mukerjee, R. Randomized response: Theory and techniques. Marcel Dekker, New York, 1988.

Dwork, C. Differential privacy: A survey of results. In Theory and Applications of Models of Computation, M. Agrawal, D. Du, Z. Duan, and A. Li (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 1–19, 2008.

Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, S. Halevi and T. Rabin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 265–284, 2006.

Dwork, C., Naor, M., Pitassi, T., and Rothblum, G. N. Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing. Association for Computing Machinery, pp. 715–724, 2010.

Dwork, C., Roth, A., et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9 (3-4): 211–407, 2014.

El Emam, K., Dankar, F. K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.-P., Walker, M., Chowdhury, S., Vaillancourt, R., et al. A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association 16 (5): 670–682, 2009.

Erlingsson, Ú., Korolova, A., and Pihur, V. RAPPOR: randomized aggregatable privacy-preserving ordinal response. CoRR vol. abs/1407.6981, pp. 1054–1067, 2014.

Fahey, R. A. and Hino, A. Covid-19, digital privacy, and the social limits on data-focused public health responses. Int. J. Inf. Manag. vol. 55, pp. 102181, 2020.

Farias, V. A., Brito, F. T., Flynn, C., Machado, J. C., Majumdar, S., and Srivastava, D. Local dampening: Differential privacy for non-numeric queries via local sensitivity. VLDB 14 (4): 521,533, 2020.

Fung, B. C. M., Wang, K., Chen, R., and Yu, P. S. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42 (4): 14:1–14:53, 2010a.

Fung, B. C. M., Wang, K., Chen, R., and Yu, P. S. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys 42 (4): 1–53, June, 2010b.

Haas, S., Wohlgemuth, S., Echizen, I., Sonehara, N., and Müller, G. Aspects of privacy for electronic health records. International journal of medical informatics 80 (2): e26–e31, 2011.

Kifer, D. and Machanavajjhala, A. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, pp. 193–204, 2011.

Kuhn, C., Beck, M., and Strufe, T. Covid notions: Towards formal definitions - and documented understanding - of privacy goals and claimed protection in proximity-tracing services. Online Soc. Networks Media vol. 22, pp. 100125, 2021.

Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and Jana, S. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 656–672, 2019.

Lee, B., Dupervil, B., Deputy, N. P., Duck, W., Soroka, S., Bottichio, L., Silk, B., Price, J., Sweeney, P., Fuld, J., Weber, T., and Pollock, D. Protecting privacy and transforming COVID-19 case surveillance datasets for public use, 2021.

Lenert, L. and McSwain, B. Y. Balancing health privacy, health information exchange, and research in the context of the COVID-19 pandemic. J. Am. Medical Informatics Assoc. 27 (6): 963–966, 2020.

Li, N., Li, T., and Venkatasubramanian, S. t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, IEEE, pp. 106–115, 2007.

Machado, J. C., Neto, E. R. D., and Filho, M. E. B. Técnicas de privacidade de dados de localização. In XXXIV SBBD, Fortaleza, CE, Brazil, October 7-10, 2019. Tópicos em Gerenciamento de Dados e Informações, 2019.

Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1 (1): 3, 2007.

McSherry, F. and Talwar, K. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07). IEEE, pp. 94–103, 2007.

McSherry, F. D. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. SIGMOD ’09. Association for Computing Machinery, New York, NY, USA, pp. 19–30, 2009.

Narayanan, A. and Shmatikov, V. How to break anonymity of the netflix prize dataset, 2006.

Nergiz, M. E., Atzori, M., and Clifton, C. Hiding the presence of individuals from shared databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data. Association for Computing Machinery, pp. 665–676, 2007.

Reiter, J. P. Differential privacy and federal data releases. Annual review of statistics and its application vol. 6, pp. 85–101, 2019.

Samarati, P. Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13 (6): 1010–1027, 2001.

SUS. Boletim epidemiológico novo coronavírus (covid-19), 2020. Acessado: 19-06-20.

Sweeney, L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10 (05): 557–570, 2002.

Vadrevu, P. K., Adusumalli, S. K., and Mangalapalli, V. K. Personal privacy preserving data publication of covid-19 pandemic data using edge computing. Journal of Critical Reviews 7 (1): 8103–8111, 2020.

Wang, Y., Wu, X., and Hu, D. Using randomized response for differential privacy preserving data collection. In EDBT/ICDT Workshops. Vol. 1558. Workshop Proceedings of the EDBT/ICDT 2016 Joint Conference, pp. 0090–6778, 2016.

Warner, S. L. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60 (309): 63–69, Mar, 1965.

Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., and Winslett, M. Differentially private histogram publication. The VLDB Journal 22 (6): 797–822, 2013.

Yu, K., Tan, L., Shang, X., Huang, J., Srivastava, G., and Chatterjee, P. Efficient and privacy-preserving medical research support platform against COVID-19: A blockchain-based approach. IEEE Consumer Electron. Mag. 10 (2): 111–120, 2021.

Downloads

Published

2021-11-19

How to Cite

B. Filho, M. E., Duarte Neto, E. R., & C. Machado, J. (2021). Privacy-preserving of patients with Differential Privacy: an experimental evaluation in COVID-19 dataset. Journal of Information and Data Management, 12(5). https://doi.org/10.5753/jidm.2021.1947

Issue

Section

SBBD 2020 Short papers - Extended Papers