Methodology for Evaluating k-Anonymity-Based Anonymization in Machine Learning Models
Abstract
The increasing volume of sensitive data generated by various domains demands robust approaches to privacy protection. Anonymization based on k-anonymity stands out for mitigating the risks of re-identification of personal data. However, the impact on the performance of machine learning models is commonly neglected. This work proposes a novel comparative method to evaluate the effects of anonymization on the performance of machine learning models, considering privacy, information loss, and performance metrics. The results provide insights for developing and improving k-anonymization-based solutions to reconcile privacy and efficiency in distributed environments.
References
Choudhury, O., Gkoulalas-Divanis, A., Salonidis, T., Sylla, I., Park, Y., Hsu, G., and Das, A. (2020). Anonymizing data for privacy-preserving federated learning. arXiv preprint arXiv:2002.09096.
Coelho, K. K., Okuyama, M. M., Nogueira, M., Vieira, A. B., Silva, E. F., and Nacif, J. A. M. (2024a). A dynamic approach to health data anonymization by separatrices. In 2024 IEEE Symposium on Computers and Communications (ISCC), pages 1–6. IEEE.
Coelho, K. K., Okuyama, M. M., Nogueira, M., Vieira, A. B., Silva, E. F., and Nacif, J. A. M. (2024b). A new k-anonymity method based on generalization first k-member clustering for healthcare data. In Transactions on Dependable and Secure Computing.
Domingo-Ferrer, J., Sánchez, D., and Soria-Comas, J. (2022). Database anonymization: privacy models, data utility, and microaggregation-based inter-model connections. Springer Nature.
Ghinita, G., Karras, P., Kalnis, P., and Mamoulis, N. (2007). Fast data anonymization with low information loss. In Proceedings of the 33rd International Conference on Very Large Data Bases, pages 758–769.
Khan, R., Tao, X., Anjum, A., Kanwal, T., Malik, S. U. R., Khan, A., Rehman, W. U., and Maple, C. (2020). θ-sensitive k-anonymity: An anonymization model for IoT-based electronic health records. Electronics, 9(5):716.
Kwatra, S. and Torra, V. (2021). A k-anonymised federated learning framework with decision trees. In International Workshop on Data Privacy Management, pages 106–120. Springer.
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. In 22nd International Conference on Data Engineering (ICDE’06), pages 25–25. IEEE.
Lhoest, Q., Del Moral, A. V., Jernite, Y., Thakur, A., Von Platen, P., Patil, S., Chaumond, J., Drame, M., Plu, J., Tunstall, L., et al. (2021). Datasets: A community library for natural language processing. arXiv preprint arXiv:2109.02846.
Liu, G., Ma, X., Yang, Y., Wang, C., and Liu, J. (2021). Federaser: Enabling efficient client-level data removal from federated learning models. In 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), pages 1–10. IEEE.
Qi, P., Chiaro, D., Guzzo, A., Ianni, M., Fortino, G., and Piccialli, F. (2024). Model aggregation techniques in federated learning: A comprehensive survey. Future Generation Computer Systems, 150:272–293.
Saleh, T. E. (2022). Comparison of the effects of data privacy preserving methods on machine learning algorithms in IoT. Master’s thesis, Marmara Universitesi (Turkey).
Salmeron, J. L. and Arévalo, I. (2024). Blind federated learning without initial model. Journal of Big Data, 11(1):56.
Slijepčević, D., Henzl, M., Klausner, L. D., Dam, T., Kieseberg, P., and Zeppelzauer, M. (2021). k-anonymity in practice: How generalisation and suppression affect machine learning classifiers. Computers & Security, 111:102488.
Torra, P. (2013). Information Fusion in Data Mining. Studies in Fuzziness and Soft Computing. Springer Berlin Heidelberg.
Torra, V. and Navarro-Arribas, G. (2023). Attribute disclosure risk for k-anonymity: the case of numerical data. International Journal of Information Security, 22(6):2015–2024.
Victor, N. and Lopez, D. (2020). Privacy preserving sensitive data publishing using (k, n, m) anonymity approach. Journal of Communications Software and Systems, 16(1):46–56.
