Data Dependent Itemset Mining Under Local Differential Privacy

  • Renan R. Oliveira Universidade Federal do Ceará http://orcid.org/0009-0001-0594-4161
  • José S. Costa Universidade Federal do Ceará (UFC)
  • José M. Monteiro Universidade Federal do Ceará (UFC)
  • Javam de Castro Machado Universidade Federal do Ceará (UFC)

Resumo


Frequent itemset mining with Apriori-based methods is widely used to uncover patterns in large datasets. However, under the private setting of Local Differential Privacy (LDP), this task poses key challenges, such as domain scalability and user set-valued data. Prior works address these problems through fixed-size transaction reporting protocols but often rely on hard-coded parameters, overlooking complexities from differing dataset profiles. To address this limitation, we propose a data-dependent framework for identifying the top-k most frequent itemsets under LDP. Through initial metadata queries and the integration of an optimization problem, we significantly improve over prior work on both synthetic and real-world datasets.
Palavras-chave: Mining. Private, Apriori, Itemset, Local, Patterns, Optimization, Privacy, Framework, Data-dependent, Top-k, Data

Referências

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, New York, NY, USA. Association for Computing Machinery.

Chan, T. H. H., Li, M., Shi, E., and Xu, W. (2012). Differentially private continual monitoring of heavy hitters from distributed streams. In Privacy Enhancing Technologies: 12th International Symposium, PETS 2012, Vigo, Spain, July 11-13, 2012. Proceedings 12, pages 140–159. Springer.

Chen, Z. and Wang, J. (2022). Ldp-fpminer: Fp-tree based frequent itemset mining with local differential privacy. arXiv preprint arXiv:2209.01333.

Dwork, C. (2011). A firm foundation for private data analysis. Commun. ACM, 54(1):86–95.

Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Halevi, S. and Rabin, T., editors, Theory of Cryptography, pages 265–284. Springer Berlin Heidelberg.

Dwork, C., Roth, A., et al. (2014). The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407.

Erlingsson, U., Pihur, V., and Korolova, A. (2014). Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA. Association for Computing Machinery.

Filho, J. S. C. and Machado, J. C. (2023). Felip: A local differentially private approach to frequency estimation on multidimensional datasets. In International Conference on Extending Database Technology.

Frenkel, S. and Isaac, M. (2018). Facebook security breach exposes accounts of 50 million users. [link]. Accessed: 2024-01-12.

Geurts, K., Wets, G., Brijs, T., and Vanhoof, K. (2003). Profiling high frequency accident locations using association rules. In Proceedings of the 82nd Annual Transportation Research Board, Washington DC. (USA), January 12-16, page 18pp.

K, M. V., Immanuel Johnraja, J., Jeba Leelipushpam, G., Jebaveerasingh Jebadurai, J., and Santhosam, I. B. (2021). Security and privacy issues in the internet of things – a survey. In 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC).

Kairouz, P., Bonawitz, K., and Ramage, D. (2016). Discrete distribution estimation under local privacy. In International Conference on Machine Learning, pages 2436–2444. PMLR.

Li, Y., Huang, C., Cheng, M., Lv, T., Zhao, Y., Sun, Y., and Yuan, Y. (2025). Privminer: a similar-first approach to frequent itemset mining under local differential privacy. World Wide Web, 28(2):25.

Lin, F., Muzumdar, K., Laptev, N. P., Curelea, M.-V., Lee, S., and Sankar, S. (2020). Fast dimensional analysis for root cause investigation in a large-scale service environment. Proc. ACM Meas. Anal. Comput. Syst., 4(2).

Neto, A. A. M., Neto, E. D., Filho, J. C., and Machado, J. (2024). Locally differentially private and consistent frequency estimation of longitudinal data. In Anais do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 367–380, Porto Alegre, RS, Brasil. SBC.

Oliveira, R. (2025). Sharkfin. Accessed: 2025-04-06.

Qin, Z., Yang, Y., Yu, T., Khalil, I., Xiao, X., and Ren, K. (2016). Heavy hitter estimation over set-valued data with local differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, page 192–203, New York, NY, USA. Association for Computing Machinery.

Satariano, A., Perlroth, N., and Tsang, A. (2018). Marriott hacking exposes data of up to 500 million guests. Accessed: 2024-01-12.

Wang, T., Blocki, J., Li, N., and Jha, S. (2017). Locally differentially private protocols for frequency estimation. In 26th USENIX Security Symposium (USENIX Security 17), pages 729–745.

Wang, T., Li, N., and Jha, S. (2018). Locally differentially private frequent itemset mining. In 2018 IEEE Symposium on Security and Privacy (SP), pages 127–143.

Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69.

Zhao, D., Zhao, S.-Y., Chen, H., Liu, R.-X., Li, C.-P., and Zhang, X.-Y. (2023). Hadamard encoding based frequent itemset mining under local differential privacy. Journal of Computer Science and Technology, 38(6):1403–1422.

Zhu, E. (2018). Set similarity search benchmarks. Accessed: 2025-04-06.

Zhu, T., Li, G., Zhou, W., and Yu, P. S. (2017). Differentially private data publishing and analysis: A survey. IEEE Transactions on Knowledge and Data Engineering, 29(8):1619–1638.
Publicado
29/09/2025
OLIVEIRA, Renan R.; COSTA, José S.; MONTEIRO, José M.; MACHADO, Javam de Castro. Data Dependent Itemset Mining Under Local Differential Privacy. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 521-534. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247281.