Privacy without Loss of Utility: Evaluation of De-identification Techniques in Deep Learning for Intensive Care Units

  • Vitor Matheus Valandro da Rosa UFSC
  • Giovana Nunes Inocêncio UFSC
  • Jean Everson Martina UFSC

Resumo


The growing adoption of Deep Learning in Intensive Care Units depends on access to high-granularity clinical data, which conflicts with privacy regulations such as Brazil’s LGPD. This work evaluates the trade-off between de-identification techniques and clinical AI utility. Using the MIMIC-III database, we applied k-anonymity, l-diversity, and Differential Privacy to demographic attributes and trained the ConCare model for in-hospital mortality prediction. Re-identification risk was audited under Prosecutor, Journalist, and Marketer attack models. Results demonstrate that robust pseudonymization (k = 10, l = 2) reduces maximum risk from 100% to 1.09% without degrading predictive performance (AUROC 0.861 vs. 0.868 baseline). We conclude that preserving sample size is critical; Differential Privacy (ϵ = 2.0) discarded 16.3% of data, yielding inferior clinical utility (0.853) compared to syntactic approaches.

Referências

Autoridade Nacional de Proteção de Dados (ANPD) (2024). Agenda regulatória para o biênio 2025-2026: Diretrizes para inteligência artificial e dados de saúde. [link]. Acesso em: mar. 2026.

Brasil (2018). Lei nº 13.709, de 14 de agosto de 2018.

Chen, R. J., Wang, J. J., Williamson, D. F. K., Chen, T. Y., Lipkova, J., Lu, M. Y., Saber, S., and Mahmood, F. (2023). Algorithmic fairness in artificial intelligence for medicine and healthcare. Nature Biomedical Engineering, 7(6):719–742.

Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284. Springer.

El Emam, K. and Arbuckle, L. (2013). Anonymizing Health Data: Case Studies and Methods to Get You Started. O’Reilly Media.

El Emam, K., Dankar, F. K., Issa, R., Jonker, E., Amyot, D., et al. (2009). A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association, 16(5):670–682.

Fakeeroodeen, Y. N. and Beeharry, Y. (2021). Hybrid data privacy and anonymization algorithms for smart health applications. International Journal of Advanced Computer Science and Applications (IJACSA), 12(6).

Fung, B. C., Wang, K., Fu, A. W.-C., and Pei, J. (2010). Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR), 42(4):1–53.

Gonçalo, W. et al. (2025). Abordagens regulatórias na proteção de dados sensíveis na saúde digital: uma revisão integrativa. Physis: Revista de Saúde Coletiva, 35(1):e350113.

Gonçalves, A. C. M. et al. (2025). Anonimização de textos clínicos utilizando llm. In Anais do Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS). SBC.

Hansson, M. et al. (2025). A systematic review of privacy-preserving techniques for synthetic tabular health data. Artificial Intelligence in Medicine.

Harutyunyan, H., Khachatrian, H., Kale, D. C., Ver Steeg, G., and Galstyan, A. (2019). Multitask learning and benchmarking with clinical time series data. Scientific Data, 6(1):1–18.

Inocêncio, G. N. and Martina, J. E. (2026). Assuring trustworthy data: A dual-criteria analysis of anonymization and system reliability in digital health (a systematic review). In Anais do Simpósio Brasileiro de Sistemas de Informação (SBSI), Brasil. Sociedade Brasileira de Computação.

Inocêncio, G. N., Severo, L. P. F., and Martina, J. E. (2026). Assessing trustworthiness in digital health: Insights from the brazilian case of “meu sus digital”. In Proceedings of the 19th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: HEALTHINF, pages 435–442. SCITEPRESS – Science and Technology Publications.

Johnson, A. E. W. et al. (2016). Mimic-iii, a freely accessible critical care database. Scientific Data, 3:160035.

Junior, J., Nakaya, H., and Rizzo, L. (2024). A inteligência artificial na medicina. Revista de Medicina, 103(1):1–5.

Kayaalp, M. et al. (2021). Data anonymization for pervasive health care: Systematic literature mapping study. JMIR Medical Informatics, 9(10):e29871.

Ma, L., Zhang, C., Wang, Y., Ruan, W., Wang, J., Tang, W., Ma, X., Gao, X., and Gao, J. (2020). Concare: Personalized clinical feature embedding via capturing the healthcare context. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 844–851.

Machado, B. B. (2025). Proteção de dados e ética em pesquisa clínica: um estudo sobre o impacto da lgpd e da resolução cns/conep 738/2024 na condução de estudos clínicos em território nacional. Master’s thesis, Universidade Federal de São Paulo (UNIFESP), São Paulo, Brasil.

Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. In ACM Transactions on Knowledge Discovery from Data (TKDD), volume 1, page 3. ACM.

Monteiro, M., Correia, F., Queiroz, P., Ramos, R., Trigo, D., and Gonçalves, G. (2024). Patterns of data anonymization. In Proceedings of the European Conference on Pattern Languages of Programs (EuroPLoP).

Morid, M. A., Sheng, O. R. L., and Dunbar, J. (2023). Time series prediction using deep learning methods in healthcare. ACM Transactions on Management Information Systems, 14(1):1–29.

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453.

Pessoa, S. M. B. et al. (2024). Previsão de infecção relacionada à assistência à saúde em pacientes adultos de uti utilizando ferramentas de inteligência artificial. In Anais do Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS). SBC.

Pilgram, L., Meurers, T., Malin, B., Schaeffner, E., Schwab, P., and Jensen, B. E. O. (2024). The costs of anonymization: case study using clinical data. Journal of Medical Internet Research, 26:e49445.

Prasser, F., Kohlmayer, F., and Kuhn, K. A. (2014). Arx—a comprehensive tool for anonymizing biomedical data. Amia Annual Symposium Proceedings, 2014:984.

Rajkomar, A., Oren, E., Chen, K., Dai, A. M., Hajaj, N., Hardt, M., Liu, P. J., Liu, X., Marcus, J., Sun, M., et al. (2018). Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine, 1(1):18.

Rodrigues, D. D. et al. (2025). Bias propagation in health ai: Measuring pre-training bias and its effect on machine learning model outcomes. In Anais do Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS). SBC.

Sousa, R. et al. (2020). Técnicas de anonimização de dados em saúde. Journal of Health Informatics, 12(3).

Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570.
Publicado
01/06/2026
ROSA, Vitor Matheus Valandro da; INOCÊNCIO, Giovana Nunes; MARTINA, Jean Everson. Privacy without Loss of Utility: Evaluation of De-identification Techniques in Deep Learning for Intensive Care Units. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 26. , 2026, Ouro Preto/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 1206-1216. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2026.21679.