Balancing Privacy and Utility: Evaluating Distributional Shifts and Accuracy in Differentially Private Synthetic Breached Data

Abstract


With increasing reliance on data-driven technologies, it is fundamental to ensure the privacy of individuals in datasets. This paper investigates the distributional shift introduced by differential privacy using a synthetically generated dataset simulating leaked personal information. We apply the Laplace mechanism to a hypothetical hotel booking adversarial disclosure scenario and analyze the impact of varying privacy budget (ε) and sensitivity (∆f ) parameters across 6,120 combinations. Through Jensen-Shannon Distance and Mean Absolute Percentage Error metrics, we quantify distributional shifts and accuracy degradation. Our findings reveal that attributes with higher entropy experience greater shift under noise addition, contributing with parameter tuning strategies for protecting sensitive data while preserving its analytical value.

References

Asquith, B., Hershbein, B., Kugler, T., Reed, S., Ruggles, S., Schroeder, J., Yesiltepe, S., and Van Riper, D. (2022). Assessing the Impact of Differential Privacy on Measures of Population and Racial Residential Segregation. Harvard Data Science Review. [link].

Blanco-Justicia, A., Sánchez, D., Domingo-Ferrer, J., and Muralidhar, K. (2022). A critical review on the use (and misuse) of differential privacy in machine learning. ACM Computing Surveys, 55(8):1–16.

Bohr, A. and Memarzadeh, K. (2020). Chapter 2 - the rise of artificial intelligence in healthcare applications. In Bohr, A. and Memarzadeh, K., editors, Artificial Intelligence in Healthcare, pages 25–60. Academic Press.

Cai, Y., Zhang, Y., Qu, J., and Li, W. (2022). Differential privacy preserving dynamic data release scheme based on jensen-shannon divergence. China Communications, 19(6):11–21.

Canedo, E. D., Bandeira, I. N., Calazans, A. T. S., Costa, P. H. T., Cançado, E. C. R., and Bonifácio, R. (2023). Privacy requirements elicitation: a systematic literature review and perception analysis of IT practitioners. Requir. Eng., 28(2):177–194.

Carey, A. N., Van, M.-H., and Wu, X. (2024). Evaluating the impact of local differential privacy on utility loss via influence functions. In 2024 International Joint Conference on Neural Networks (IJCNN), pages 1–10.

Cochran, W. G. (1977). Sampling techniques. John Wiley & Sons.

Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2016). Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3):17–51.

Flovik, V. (2024). Quantifying distribution shifts and uncertainties for enhanced model robustness in machine learning applications. arXiv preprint arXiv:2405.01978.

Garcia, R. D. and Ueyama, J. (2024). Blockchain-based data governance for privacy-preserving in multi-stakeholder settings. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), pages 33–40. SBC.

Ghaderi, Z., Beal, L., and Houanti, L. (2024). Cybersecurity threats in tourism and hospitality: perspectives from tourists engaging with sharing economy services. Current Issues in Tourism, pages 1–16.

Holohan, N., Braghin, S., Mac Aonghusa, P., and Levacher, K. (2019). Diffprivlib: the IBM differential privacy library. arXiv preprint arXiv:1907.02444.

Kenny, C. T., Kuriwaki, S., McCartan, C., Rosenman, E. T., Simko, T., and Imai, K. (2021). The use of differential privacy for census data and its impact on redistricting: The case of the 2020 us census. Science advances, 7(41):eabk3283.

Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The annals of mathematical statistics, 22(1):79–86.

Lei, J., Wang, L., Pei, Q., Sun, W., Lin, X., and Liu, X. (2024). Privgrid: Privacy-preserving individual load forecasting service for smart grid. IEEE Transactions on Information Forensics and Security, 19:6856–6870.

Li, N. and Wang, T. (2024). Review of popular algorithms for differential privacy. In Handbook of Sharing Confidential Data, pages 39–51. Chapman and Hall/CRC.

Li, Y., Liu, Y., Li, B., Wang, W., and Liu, N. (2023). Towards practical differential privacy in data analysis: Understanding the effect of epsilon on utility in private erm. Computers & Security, 128:103147.

Lin, J. (2002). Divergence measures based on the Shannon entropy. IEEE Transactions on Information theory, 37(1):145–151.

Malinin, A., Band, N., Ganshin, Alexander, Chesnokov, G., Gal, Y., Gales, M. J. F., Noskov, A., Ploskonosov, A., Prokhorenkova, L., Provilkov, I., Raina, V., Raina, V., Roginskiy, Denis, Shmatova, M., Tigas, P., and Yangel, B. (2022). Shifts: A dataset of real distributional shift across multiple large-scale tasks. arXiv.

Mir, D. J. (2012). Information-theoretic foundations of differential privacy. In International symposium on foundations and practice of security, pages 374–381. Springer.

Mueller, J. T. and Santos-Lozada, A. R. (2022). The 2020 us census differential privacy method introduces disproportionate discrepancies for rural and non-white populations. Population Research and Policy Review, 41(4):1417–1430.

Ngong, I. C., Stenger, B., Near, J. P., and Feng, Y. (2024). Evaluating the usability of differential privacy tools with data practitioners. In Twentieth Symposium on Usable Privacy and Security (SOUPS 2024), pages 21–40, Philadelphia, PA. USENIX Association.

Ouadrhiri, A. E. and Abdelhadi, A. (2022). Differential privacy for deep and federated learning: A survey. IEEE Access, 10:22359–22380.

Prabhu, B. A., Dani, R., and Bhatt, C. (2023). A study of the challenges faced by the hotel sector with regards to cyber security. In Automation and computation, pages 284–294. CRC Press.

Prokhorenkov, D. (2022). Anonymization level and compliance for differential privacy: A systematic literature review. In 2022 International Wireless Communications and Mobile Computing (IWCMC), pages 1119–1124.

Rigaki, M. and Garcia, S. (2023). A survey of privacy attacks in machine learning. ACM Comput. Surv., 56(4).

Rodrigues, G. A. P., de Oliveira, M. N., Serrano, A. L. M., Rocha Filho, G. P., Vergara, G. F., Mosquéra, L. R., and Gonçalves, V. P. (2025). MELISSA: An LLM-powered smart home energy consumption monitoring framework. In Simpósio Brasileiro de Computação Ubíqua e Pervasiva (SBCUP), pages 11–20. SBC.

Sato, T. and Minamide, Y. (2025). Differential privacy. Arch. Formal Proofs, 2025.

Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3):379–423.

Sharma, A. and Bantan, M. (2025). Simulating data breaches: Synthetic datasets for depicting personally identifiable information through scenario-based breaches. Data in Brief, 58:111207.

Tomás, J., Rasteiro, D., and Bernardino, J. (2022). Data anonymization: An experimental evaluation using open-source tools. Future Internet, 14(6).

Wairimu, S., Iwaya, L. H., Fritsch, L., and Lindskog, S. (2024). On the evaluation of privacy impact assessment and privacy risk assessment methodologies: A systematic literature review. IEEE Access, 12:19625–19650.

Wen, M., Xie, R., Lu, K., Wang, L., and Zhang, K. (2022). Feddetect: A novel privacy-preserving federated learning framework for energy theft detection in smart grid. IEEE Internet of Things Journal, 9(8):6069–6080.

Yao, A., Li, G., Li, X., Jiang, F., Xu, J., and Liu, X. (2023). Differential privacy in edge computing-based smart city applications:security issues, solutions and future directions. Array, 19:100293.

Zhang, S., Hagermalm, A., Slavnic, S., Schiller, E. M., and Almgren, M. (2023). Evaluation of open-source tools for differential privacy. Sensors, 23(14).
Published
2025-09-01
RODRIGUES, Gabriel Arquelau Pimenta et al. Balancing Privacy and Utility: Evaluating Distributional Shifts and Accuracy in Differentially Private Synthetic Breached Data. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 25. , 2025, Foz do Iguaçu/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 239-255. DOI: https://doi.org/10.5753/sbseg.2025.9825.

Most read articles by the same author(s)

1 2 3 4 5 > >>