Controlled Eigenvector Perturbation in PCA for Privacy Preservation in Sensitive Data
Abstract
The growing adoption of digital solutions based on user data requires robust privacy mechanisms. Excessive perturbations can significantly degrade the performance of Machine Learning models. In this work, we propose PerturbPCA-α, an adjustable approach based on PCA. The technique introduces continuous perturbation to the principal eigenvectors of the dataset. This enables explicit control of the trade-off between privacy and utility. Consequently, the learning capability of ML models is preserved, as demonstrated by experiments on healthcare data that validate the effectiveness of the proposed method.References
Aleroud, A., Chen, Z., and Karabatis, G. (2016). Network trace anonymization using a prefix-preserving condensation-based technique (short paper). In OTM Confederated International Conferences”On the Move to Meaningful Internet Systems , pages 934–942. Springer.
Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., and Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6):061907.
Barman, P. (2024). IoMT Dataset for ML-Based Health Monitoring. [link]. [Online]. Available: Kaggle Dataset.
Boikanyo, K., Zungeru, A. M., Sigweni, B., Yahya, A., and Lebekwe, C. (2023). Remote patient monitoring systems: Applications, architecture, and challenges. Scientific African, 20:e01638.
Gong, X., Chen, Y., Wang, Q., Wang, M., and Li, S. (2022). Private data inference attacks against cloud: Model, technologies, and research directions. IEEE Communications Magazine, 60(9):46–52.
Hyrup, T., Lautrup, A. D., Zimek, A., and Schneider-Kamp, P. (2025). A systematic review of privacy-preserving techniques for synthetic tabular health data. Discover Data, 3(1):1–32.
Kamalov, F., Pourghebleh, B., Gheisari, M., Liu, Y., and Moussa, S. (2023). Internet of medical things privacy and security: Challenges, solutions, and future trends from a new perspective. Sustainability, 15(4).
Mandal, G. (2025). Patient Data for Healthcare Monitoring System. [link]. [Online]. Available: Kaggle Dataset.
Nobre, F. V. J., Silva, D. d. S., Ferreira, M. C. M. M., Brito, M. L. M. L., de Araújo, T. P., and Gomes, R. L. (2025). Time-weighted correlation approach to identify high delay links in internet service providers. Journal of Internet Services and Applications, 16(1):419–430.
Ozcelik, M. M., Kok, I., and Ozdemir, S. (2025). A survey on internet of medical things (iomt): Enabling technologies, security and explainability issues, challenges, and future directions. Expert Systems, 42(5):e70010.
Pimenta, I., Silva, D., Moura, E., Silveira, M., and Gomes, R. L. (2024). Impact of data anonymization in machine learning models. In Proceedings of the 13th Latin-American Symposium on Dependable and Secure Computing, pages 188–191.
Pimenta, I. A., Lee, M. H., Bittencourt, L. F., and Gomes, R. L. (2025). Adaptive privacy based on mutual information for machine learning in edge-cloud environments. IEEE Networking Letters.
Razdan, S. and Sharma, S. (2022). Internet of medical things (iomt): Overview, emerging technologies, and case studies. IETE Technical Review, 39(4):775–788.
Shokri, R., Stronati, M., Song, C., and Shmatikov, V. (2017). Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE.
Silveira, M. M., Portela, A. L., Menezes, R. A., Souza, M. S., Silva, D. S., Mesquita, M. C., and Gomes, R. L. (2023). Data protection based on searchable encryption and anonymization techniques. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pages 1–5. IEEE.
Souza, M. S., Ribeiro, S. E. S. B., Lima, V. C., Cardoso, F. J., and Gomes, R. L. (2024). Combining regular expressions and machine learning for sql injection detection in urban computing. Journal of Internet Services and Applications, 15(1):103–111.
Thabit, F., Alhomdy, S., and Jagtap, S. (2021). A new data security algorithm for the cloud computing based on genetics techniques and logical-mathematical functions. International Journal of Intelligent Networks, 2:18–33.
Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., and Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6):061907.
Barman, P. (2024). IoMT Dataset for ML-Based Health Monitoring. [link]. [Online]. Available: Kaggle Dataset.
Boikanyo, K., Zungeru, A. M., Sigweni, B., Yahya, A., and Lebekwe, C. (2023). Remote patient monitoring systems: Applications, architecture, and challenges. Scientific African, 20:e01638.
Gong, X., Chen, Y., Wang, Q., Wang, M., and Li, S. (2022). Private data inference attacks against cloud: Model, technologies, and research directions. IEEE Communications Magazine, 60(9):46–52.
Hyrup, T., Lautrup, A. D., Zimek, A., and Schneider-Kamp, P. (2025). A systematic review of privacy-preserving techniques for synthetic tabular health data. Discover Data, 3(1):1–32.
Kamalov, F., Pourghebleh, B., Gheisari, M., Liu, Y., and Moussa, S. (2023). Internet of medical things privacy and security: Challenges, solutions, and future trends from a new perspective. Sustainability, 15(4).
Mandal, G. (2025). Patient Data for Healthcare Monitoring System. [link]. [Online]. Available: Kaggle Dataset.
Nobre, F. V. J., Silva, D. d. S., Ferreira, M. C. M. M., Brito, M. L. M. L., de Araújo, T. P., and Gomes, R. L. (2025). Time-weighted correlation approach to identify high delay links in internet service providers. Journal of Internet Services and Applications, 16(1):419–430.
Ozcelik, M. M., Kok, I., and Ozdemir, S. (2025). A survey on internet of medical things (iomt): Enabling technologies, security and explainability issues, challenges, and future directions. Expert Systems, 42(5):e70010.
Pimenta, I., Silva, D., Moura, E., Silveira, M., and Gomes, R. L. (2024). Impact of data anonymization in machine learning models. In Proceedings of the 13th Latin-American Symposium on Dependable and Secure Computing, pages 188–191.
Pimenta, I. A., Lee, M. H., Bittencourt, L. F., and Gomes, R. L. (2025). Adaptive privacy based on mutual information for machine learning in edge-cloud environments. IEEE Networking Letters.
Razdan, S. and Sharma, S. (2022). Internet of medical things (iomt): Overview, emerging technologies, and case studies. IETE Technical Review, 39(4):775–788.
Shokri, R., Stronati, M., Song, C., and Shmatikov, V. (2017). Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE.
Silveira, M. M., Portela, A. L., Menezes, R. A., Souza, M. S., Silva, D. S., Mesquita, M. C., and Gomes, R. L. (2023). Data protection based on searchable encryption and anonymization techniques. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pages 1–5. IEEE.
Souza, M. S., Ribeiro, S. E. S. B., Lima, V. C., Cardoso, F. J., and Gomes, R. L. (2024). Combining regular expressions and machine learning for sql injection detection in urban computing. Journal of Internet Services and Applications, 15(1):103–111.
Thabit, F., Alhomdy, S., and Jagtap, S. (2021). A new data security algorithm for the cloud computing based on genetics techniques and logical-mathematical functions. International Journal of Intelligent Networks, 2:18–33.
Published
2026-05-25
How to Cite
PIMENTA, Ivo A.; FREITAS, Kaynan S.; MOURA, Evellin S.; NASCIMENTO, Erick S.; FARIA, Fabio A.; GOMES, Rafael L..
Controlled Eigenvector Perturbation in PCA for Privacy Preservation in Sensitive Data. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 44. , 2026, Praia do Forte/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 1136-1149.
ISSN 2177-9384.
DOI: https://doi.org/10.5753/sbrc.2026.19276.
