A Dynamic Approach to Health Data Anonymization Using Quantiles

  • Kristtopher K. Coelho UFV
  • Maurício M. Okuyama UF
  • Michele Nogueira UFMG
  • Alex Borges Vieira UFJF
  • Edelberto Franco Silva UFJF
  • José Augusto M. Nacif UFV

Abstract


Technological advances enable the integration of Internet of Things (IoT) devices to perform continuous and proactive patient monitoring. These devices collect a large volume of sensitive data that requires privacy. Anonymization provides privacy by removing or modifying information that identifies an individual. However, traditional anonymization techniques, such as k-anonymity, depend on a fixed and pre-defined k value, susceptible to attribute identification attacks. This article presents Dynamic Anonymization by Separatrices (DAS), an approach for defining the ideal value k, and for dynamic grouping of data to be anonymized using separatrices measurements. Results show that the proposed approach efficiently mitigates attribute identification attacks.

References

Abouelmehdi, K., Beni-Hessane, A., and Khaloufi, H. (2018). Big healthcare data: preserving security and privacy. Journal of big data, 5(1):1–18.

Arava, K. and Lingamgunta, S. (2020). Adaptive k-anonymity approach for privacy preserving in cloud. Arabian Journal for Science and Engineering, 45(4):2425–2432.

Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L., et al. (2014). A systematic comparison and evaluation of k-anonymization algorithms for practitioners. Trans. Data Priv., 7(3):337–370.

Bache, K. and Lichman, M. (2013). UCI machine learning repository.

Batko, K. and Ślęzak, A. (2022). The use of big data analytics in healthcare. Journal of big Data, 9(1):3.

Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. DOI: 10.24432/C5XW20.

Bholowalia, P. and Kumar, A. (2014). Ebk-means: A clustering technique based on elbow method and k-means in wsn. International Journal of Computer Applications, 105(9).

Byun, J.-W., Kamra, A., Bertino, E., and Li, N. (2007). Efficient k-anonymization using clustering techniques. In International Conference on Database Systems for Advanced Applications, pages 188–200. Springer.

Christen, P., Ranbaduge, T., and Schnell, R. (2020). Linking sensitive data. Methods and techniques for practical privacy-preserving information sharing. Cham: Springer.

Coelho, K. K., Tristão, E. T., Nogueira, M., Vieira, A. B., and Nacif, J. A. (2023). Multimodal biometric authentication method by federated learning. Biomedical Signal Processing and Control, 85:105022.

Correa, S. (2003). Probabilidade e estatística.

Developers, N. (2024). numpy.percentile.

Dinh, D.-T., Huynh, V.-N., and Sriboonchitta, S. (2021). Clustering mixed numerical and categorical data with missing values. Information Sciences, 571:418–442.

Domingo-Ferrer, J. and Mateo-Sanz, J. M. (2002). Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and data Engineering, 14(1):189–201.

El Ouazzani, Z. and El Bakkali, H. (2018). A new technique ensuring privacy in big data: K-anonymity without prior value of the threshold k. Procedia Computer Science, 127:52–59. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES, ICDS2017.

Fernandes, L. M., O’Connor, M., and Weaver, V. (2012). Big data, bigger outcomes. Journal of AHIMA, 83(10):38–43.

Ghinita, G., Karras, P., Kalnis, P., and Mamoulis, N. (2007). Fast data anonymization with low information loss. In Proceedings of the 33rd international conference on Very large data bases, pages 758–769.

Hyndman, R. J. and Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4):361–365.

Jiang, L. and Torra, V. (2023). Data protection and multi-database data-driven models. Future Internet, 15(3).

Ketu, S. and Mishra, P. K. (2021). Internet of healthcare things: A contemporary survey. Journal of Network and Computer Applications, 192:103179.

Khan, R., Tao, X., Anjum, A., Kanwal, T., Malik, S. U. R., Khan, A., Rehman, W. U., and Maple, C. (2020). θ-sensitive k-anonymity: An anonymization model for iot based electronic health records. Electronics, 9(5):716.

Kodinariya, T. M., Makwana, P. R., et al. (2013). Review on determining number of cluster in k-means clustering. International Journal, 1(6):90–95.

LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005a). Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 49–60.

LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005b). Multidimensional k-anonymity. Technical report, University of Wisconsin-Madison Department of Computer Sciences.

Liu, F. and Li, T. (2018). A clustering k-anonymity privacy-preserving method for wearable iot devices. Security and Communication Networks, 2018:1–8.

Olatunji, I. E., Rauch, J., Katzensteiner, M., and Khosla, M. (2022). A review of anonymization for healthcare data. Big data.

Onesimu, J. A., Karthikeyan, J., Eunice, J., Pomplun, M., and Dang, H. (2022). Privacy preserving attribute-focused anonymization scheme for healthcare data publishing. IEEE Access, 10:86979–86997.

Satopaa, V., Albrecht, J., Irwin, D., and Raghavan, B. (2011). Finding a"kneedle"in a haystack: Detecting knee points in system behavior. In 2011 31st international conference on distributed computing systems workshops, pages 166–171. IEEE.

Shahid, J., Ahmad, R., Kiani, A. K., Ahmad, T., Saeed, S., and Almuhaideb, A. M. (2022). Data protection and privacy of the internet of healthcare things (iohts). Applied Sciences, 12(4).

Sokas, D., Butkuvienė, M., Tamulevičiūtė-Prascienė, E., Beigienė, A., Kubilius, R., Petrėnas, A., and Paliakaitė, B. (2022). Wearable-based signals during physical exercises from patients with frailty after open-heart surgery. PhysioNet.

Templ, M. (2008). Statistical disclosure control for microdata using the r-package sdcmicro. Transactions on Data Privacy, 1(2):67–85.

Torra, V. and Navarro-Arribas, G. (2023). Attribute disclosure risk for k-anonymity: the case of numerical data. International Journal of Information Security, 22(6):2015–2024.

Victor, N. and Lopez, D. (2020). Privacy preserving sensitive data publishing using (k, n, m) anonymity approach. Journal of communications software and systems, 16(1):46–56.

Yuan, C. and Yang, H. (2019). Research on k-value selection method of k-means clustering algorithm. J, 2(2):226–235.
Published
2024-05-20
COELHO, Kristtopher K.; OKUYAMA, Maurício M.; NOGUEIRA, Michele; VIEIRA, Alex Borges; SILVA, Edelberto Franco; NACIF, José Augusto M.. A Dynamic Approach to Health Data Anonymization Using Quantiles. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 42. , 2024, Niterói/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 826-839. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2024.1481.

Most read articles by the same author(s)

1 2 3 4 5 > >>