Training and Test Machine Learning Models on Encrypted Data: Initial Results and Challenges

  • Rodrigo Kruger PUCPR
  • Jean Paul Barddal PUCPR
  • Vinicius M. A. Souza PUCPR

Resumo


Privacy is critical when using Machine Learning (ML) models over sensitive data, like healthcare, finance, and legal systems. Many of these models are trained or executed on cloud services, meaning sensitive data is transmitted over the network, or third-party services operate directly on unprotected data during training and inference, increasing exposure to potential leaks. Data encryption is a promising solution that guarantees high privacy levels. An adequate cryptography solution for ML is Homomorphic Encryption, a cryptographic method that allows mathematical operations on ciphertexts, i.e., encrypted data, producing encrypted models and outputs that only authorized parties can decrypt. However, the protection offered by Homomorphic Encryption comes at a significant computational overhead. Additionally, only specific mathematical operations (typically additions and multiplications) are allowed, and encrypted computations accumulate noise that reduces the result’s precision. This paper discusses the challenges of using encrypted data in training and test steps of ML models. It experimentally analyzes the impact on error rates and processing times when traditional classifiers, such as Artificial Neural Network and Logistic Regression, are adapted to process encrypted data. We adopt the CKKS scheme, a Homomorphic Encryption method that supports approximate computations over real numbers and adapted the activation functions of the classifiers using three approximation methods in an experimental evaluation with five medical datasets.
Publicado
29/09/2025
KRUGER, Rodrigo; BARDDAL, Jean Paul; SOUZA, Vinicius M. A.. Training and Test Machine Learning Models on Encrypted Data: Initial Results and Challenges. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 562-577. ISSN 2643-6264.