Training and Test Machine Learning Models on Encrypted Data: Initial Results and Challenges

Rodrigo Kruger; Jean Paul Barddal; Vinicius M. A. Souza

Rodrigo Kruger PUCPR
Jean Paul Barddal PUCPR
Vinicius M. A. Souza PUCPR

Resumo

Privacy is critical when using Machine Learning (ML) models over sensitive data, like healthcare, finance, and legal systems. Many of these models are trained or executed on cloud services, meaning sensitive data is transmitted over the network, or third-party services operate directly on unprotected data during training and inference, increasing exposure to potential leaks. Data encryption is a promising solution that guarantees high privacy levels. An adequate cryptography solution for ML is Homomorphic Encryption, a cryptographic method that allows mathematical operations on ciphertexts, i.e., encrypted data, producing encrypted models and outputs that only authorized parties can decrypt. However, the protection offered by Homomorphic Encryption comes at a significant computational overhead. Additionally, only specific mathematical operations (typically additions and multiplications) are allowed, and encrypted computations accumulate noise that reduces the result’s precision. This paper discusses the challenges of using encrypted data in training and test steps of ML models. It experimentally analyzes the impact on error rates and processing times when traditional classifiers, such as Artificial Neural Network and Logistic Regression, are adapted to process encrypted data. We adopt the CKKS scheme, a Homomorphic Encryption method that supports approximate computations over real numbers and adapted the activation functions of the classifiers using three approximation methods in an experimental evaluation with five medical datasets.