Uma Abordagem Dinâmica para Anonimização de Dados de Saúde por Separatrizes
Resumo
Os avanços tecnológicos possibilitam a integração de dispositivos da Internet das Coisas (IoT) para realizar o monitoramento contínuo e proativo de pacientes. Esses dispositivos coletam um grande volume de dados, sendo muitos desses dados sensíveis, exigindo privacidade. A anonimização oferece privacidade ao remover ou modificar informações que identifiquem um indivíduo. Entretanto, as técnicas de anonimização tradicionais, tais como o k-anonimato, são dependentes de um valor k fixo e pré-definido, sendo suscetíveis a ataques de identificação de atributos. Este artigo apresenta a Anonimização Dinâmica por Separatriz (Dynamic Anonymization by Separatrices – DAS), uma abordagem para definição do valor ideal k e para o agrupamento dinâmica dos dados a serem anonimizados usando medidas de separatrizes. Os resultados mostram que a abordagem proposta é eficiente para mitigar ataques de identificação de atributos.Referências
Abouelmehdi, K., Beni-Hessane, A., and Khaloufi, H. (2018). Big healthcare data: preserving security and privacy. Journal of big data, 5(1):1–18.
Arava, K. and Lingamgunta, S. (2020). Adaptive k-anonymity approach for privacy preserving in cloud. Arabian Journal for Science and Engineering, 45(4):2425–2432.
Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L., et al. (2014). A systematic comparison and evaluation of k-anonymization algorithms for practitioners. Trans. Data Priv., 7(3):337–370.
Bache, K. and Lichman, M. (2013). UCI machine learning repository.
Batko, K. and Ślęzak, A. (2022). The use of big data analytics in healthcare. Journal of big Data, 9(1):3.
Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. DOI: 10.24432/C5XW20.
Bholowalia, P. and Kumar, A. (2014). Ebk-means: A clustering technique based on elbow method and k-means in wsn. International Journal of Computer Applications, 105(9).
Byun, J.-W., Kamra, A., Bertino, E., and Li, N. (2007). Efficient k-anonymization using clustering techniques. In International Conference on Database Systems for Advanced Applications, pages 188–200. Springer.
Christen, P., Ranbaduge, T., and Schnell, R. (2020). Linking sensitive data. Methods and techniques for practical privacy-preserving information sharing. Cham: Springer.
Coelho, K. K., Tristão, E. T., Nogueira, M., Vieira, A. B., and Nacif, J. A. (2023). Multimodal biometric authentication method by federated learning. Biomedical Signal Processing and Control, 85:105022.
Correa, S. (2003). Probabilidade e estatística.
Developers, N. (2024). numpy.percentile.
Dinh, D.-T., Huynh, V.-N., and Sriboonchitta, S. (2021). Clustering mixed numerical and categorical data with missing values. Information Sciences, 571:418–442.
Domingo-Ferrer, J. and Mateo-Sanz, J. M. (2002). Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and data Engineering, 14(1):189–201.
El Ouazzani, Z. and El Bakkali, H. (2018). A new technique ensuring privacy in big data: K-anonymity without prior value of the threshold k. Procedia Computer Science, 127:52–59. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES, ICDS2017.
Fernandes, L. M., O’Connor, M., and Weaver, V. (2012). Big data, bigger outcomes. Journal of AHIMA, 83(10):38–43.
Ghinita, G., Karras, P., Kalnis, P., and Mamoulis, N. (2007). Fast data anonymization with low information loss. In Proceedings of the 33rd international conference on Very large data bases, pages 758–769.
Hyndman, R. J. and Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4):361–365.
Jiang, L. and Torra, V. (2023). Data protection and multi-database data-driven models. Future Internet, 15(3).
Ketu, S. and Mishra, P. K. (2021). Internet of healthcare things: A contemporary survey. Journal of Network and Computer Applications, 192:103179.
Khan, R., Tao, X., Anjum, A., Kanwal, T., Malik, S. U. R., Khan, A., Rehman, W. U., and Maple, C. (2020). θ-sensitive k-anonymity: An anonymization model for iot based electronic health records. Electronics, 9(5):716.
Kodinariya, T. M., Makwana, P. R., et al. (2013). Review on determining number of cluster in k-means clustering. International Journal, 1(6):90–95.
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005a). Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 49–60.
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005b). Multidimensional k-anonymity. Technical report, University of Wisconsin-Madison Department of Computer Sciences.
Liu, F. and Li, T. (2018). A clustering k-anonymity privacy-preserving method for wearable iot devices. Security and Communication Networks, 2018:1–8.
Olatunji, I. E., Rauch, J., Katzensteiner, M., and Khosla, M. (2022). A review of anonymization for healthcare data. Big data.
Onesimu, J. A., Karthikeyan, J., Eunice, J., Pomplun, M., and Dang, H. (2022). Privacy preserving attribute-focused anonymization scheme for healthcare data publishing. IEEE Access, 10:86979–86997.
Satopaa, V., Albrecht, J., Irwin, D., and Raghavan, B. (2011). Finding a"kneedle"in a haystack: Detecting knee points in system behavior. In 2011 31st international conference on distributed computing systems workshops, pages 166–171. IEEE.
Shahid, J., Ahmad, R., Kiani, A. K., Ahmad, T., Saeed, S., and Almuhaideb, A. M. (2022). Data protection and privacy of the internet of healthcare things (iohts). Applied Sciences, 12(4).
Sokas, D., Butkuvienė, M., Tamulevičiūtė-Prascienė, E., Beigienė, A., Kubilius, R., Petrėnas, A., and Paliakaitė, B. (2022). Wearable-based signals during physical exercises from patients with frailty after open-heart surgery. PhysioNet.
Templ, M. (2008). Statistical disclosure control for microdata using the r-package sdcmicro. Transactions on Data Privacy, 1(2):67–85.
Torra, V. and Navarro-Arribas, G. (2023). Attribute disclosure risk for k-anonymity: the case of numerical data. International Journal of Information Security, 22(6):2015–2024.
Victor, N. and Lopez, D. (2020). Privacy preserving sensitive data publishing using (k, n, m) anonymity approach. Journal of communications software and systems, 16(1):46–56.
Yuan, C. and Yang, H. (2019). Research on k-value selection method of k-means clustering algorithm. J, 2(2):226–235.
Arava, K. and Lingamgunta, S. (2020). Adaptive k-anonymity approach for privacy preserving in cloud. Arabian Journal for Science and Engineering, 45(4):2425–2432.
Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L., et al. (2014). A systematic comparison and evaluation of k-anonymization algorithms for practitioners. Trans. Data Priv., 7(3):337–370.
Bache, K. and Lichman, M. (2013). UCI machine learning repository.
Batko, K. and Ślęzak, A. (2022). The use of big data analytics in healthcare. Journal of big Data, 9(1):3.
Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. DOI: 10.24432/C5XW20.
Bholowalia, P. and Kumar, A. (2014). Ebk-means: A clustering technique based on elbow method and k-means in wsn. International Journal of Computer Applications, 105(9).
Byun, J.-W., Kamra, A., Bertino, E., and Li, N. (2007). Efficient k-anonymization using clustering techniques. In International Conference on Database Systems for Advanced Applications, pages 188–200. Springer.
Christen, P., Ranbaduge, T., and Schnell, R. (2020). Linking sensitive data. Methods and techniques for practical privacy-preserving information sharing. Cham: Springer.
Coelho, K. K., Tristão, E. T., Nogueira, M., Vieira, A. B., and Nacif, J. A. (2023). Multimodal biometric authentication method by federated learning. Biomedical Signal Processing and Control, 85:105022.
Correa, S. (2003). Probabilidade e estatística.
Developers, N. (2024). numpy.percentile.
Dinh, D.-T., Huynh, V.-N., and Sriboonchitta, S. (2021). Clustering mixed numerical and categorical data with missing values. Information Sciences, 571:418–442.
Domingo-Ferrer, J. and Mateo-Sanz, J. M. (2002). Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and data Engineering, 14(1):189–201.
El Ouazzani, Z. and El Bakkali, H. (2018). A new technique ensuring privacy in big data: K-anonymity without prior value of the threshold k. Procedia Computer Science, 127:52–59. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES, ICDS2017.
Fernandes, L. M., O’Connor, M., and Weaver, V. (2012). Big data, bigger outcomes. Journal of AHIMA, 83(10):38–43.
Ghinita, G., Karras, P., Kalnis, P., and Mamoulis, N. (2007). Fast data anonymization with low information loss. In Proceedings of the 33rd international conference on Very large data bases, pages 758–769.
Hyndman, R. J. and Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4):361–365.
Jiang, L. and Torra, V. (2023). Data protection and multi-database data-driven models. Future Internet, 15(3).
Ketu, S. and Mishra, P. K. (2021). Internet of healthcare things: A contemporary survey. Journal of Network and Computer Applications, 192:103179.
Khan, R., Tao, X., Anjum, A., Kanwal, T., Malik, S. U. R., Khan, A., Rehman, W. U., and Maple, C. (2020). θ-sensitive k-anonymity: An anonymization model for iot based electronic health records. Electronics, 9(5):716.
Kodinariya, T. M., Makwana, P. R., et al. (2013). Review on determining number of cluster in k-means clustering. International Journal, 1(6):90–95.
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005a). Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 49–60.
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005b). Multidimensional k-anonymity. Technical report, University of Wisconsin-Madison Department of Computer Sciences.
Liu, F. and Li, T. (2018). A clustering k-anonymity privacy-preserving method for wearable iot devices. Security and Communication Networks, 2018:1–8.
Olatunji, I. E., Rauch, J., Katzensteiner, M., and Khosla, M. (2022). A review of anonymization for healthcare data. Big data.
Onesimu, J. A., Karthikeyan, J., Eunice, J., Pomplun, M., and Dang, H. (2022). Privacy preserving attribute-focused anonymization scheme for healthcare data publishing. IEEE Access, 10:86979–86997.
Satopaa, V., Albrecht, J., Irwin, D., and Raghavan, B. (2011). Finding a"kneedle"in a haystack: Detecting knee points in system behavior. In 2011 31st international conference on distributed computing systems workshops, pages 166–171. IEEE.
Shahid, J., Ahmad, R., Kiani, A. K., Ahmad, T., Saeed, S., and Almuhaideb, A. M. (2022). Data protection and privacy of the internet of healthcare things (iohts). Applied Sciences, 12(4).
Sokas, D., Butkuvienė, M., Tamulevičiūtė-Prascienė, E., Beigienė, A., Kubilius, R., Petrėnas, A., and Paliakaitė, B. (2022). Wearable-based signals during physical exercises from patients with frailty after open-heart surgery. PhysioNet.
Templ, M. (2008). Statistical disclosure control for microdata using the r-package sdcmicro. Transactions on Data Privacy, 1(2):67–85.
Torra, V. and Navarro-Arribas, G. (2023). Attribute disclosure risk for k-anonymity: the case of numerical data. International Journal of Information Security, 22(6):2015–2024.
Victor, N. and Lopez, D. (2020). Privacy preserving sensitive data publishing using (k, n, m) anonymity approach. Journal of communications software and systems, 16(1):46–56.
Yuan, C. and Yang, H. (2019). Research on k-value selection method of k-means clustering algorithm. J, 2(2):226–235.
Publicado
20/05/2024
Como Citar
COELHO, Kristtopher K.; OKUYAMA, Maurício M.; NOGUEIRA, Michele; VIEIRA, Alex Borges; SILVA, Edelberto Franco; NACIF, José Augusto M..
Uma Abordagem Dinâmica para Anonimização de Dados de Saúde por Separatrizes. In: SIMPÓSIO BRASILEIRO DE REDES DE COMPUTADORES E SISTEMAS DISTRIBUÍDOS (SBRC), 42. , 2024, Niterói/RJ.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 826-839.
ISSN 2177-9384.
DOI: https://doi.org/10.5753/sbrc.2024.1481.