Evaluating Federated Learning with Homomorphic Encryption for Medical Named Entity Recognition Using Compact BERT Models

Marcos F. Pontes; Rodrigo C. Pedrosa; Pedro H. Lopes; Eduardo J. Luz

doi:10.5753/stil.2024.245381

Marcos F. Pontes UFOP http://orcid.org/0000-0001-8721-0171
Rodrigo C. Pedrosa UFOP https://orcid.org/0000-0003-2547-3835
Pedro H. Lopes UFOP
Eduardo J. Luz UFOP https://orcid.org/0000-0001-5249-1559

DOI: https://doi.org/10.5753/stil.2024.245381

Resumo

Medical Named Entity Recognition (NER) identifies and categorizes medical entities from unstructured texts, crucial for health monitoring tasks. Despite advancements with Large Language Models (LLMs), medical NER faces challenges due to limited and dispersed labeled data across institutions, protected under privacy regulations. Federated Learning (FL) offers a solution by enabling decentralized model training while preserving data privacy, but it is vulnerable to byzantine attacks. This research proposes a simple and secure FL protocol using Homomorphic Encryption (HE), called FedHE, that removes the need of trust between the federations and the training coordinator. Encrypted FL imposes significant constraints regarding resources consumption and performance, making the state-of-the-art language models impractical. This research aims to assess how well compact BERT representations work in federated medical NER tasks in comparison to the state-of-the-art approaches. The results showed that compact BERT representations, such as BERTmini are competitive with the state-of-the-art, and are feasible to use in FedHE. However, resource consumption overheads remain a challenge, particularly when the number of clients increase.

Palavras-chave: Cryptography, Federated Learning, Homomorphic Encryption, Named Entity Recognition, BERT

Referências

Al Badawi, A. and Polyakov, Y. (2023). Demystifying bootstrapping in fully homomorphic encryption. Cryptology ePrint Archive. [link]

Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Fernandez-Marques, J., Gao, Y., Sani, L., Li, K. H., Parcollet, T., de Gusmao, P. P. B., et al. (2020). Flower: A friendly federated ˜ learning research framework. arXiv preprint arXiv:2007.14390. [link] DOI: 10.48550/arXiv.2007.14390

Konečný, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. (2016). Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. [link] DOI: 10.48550/arXiv.1610.05492

Marcolla, C., Sucasas, V., Manzano, M., Bassoli, R., Fitzek, F. H., and Aaraj, N. (2022). Survey on fully homomorphic encryption, theory, and applications. Proceedings of the IEEE, 110(10):1572–1609. [link] DOI: 10.1109/JPROC.2022.3205665

Peng, L., Luo, G., Zhou, S., Chen, J., Xu, Z., Sun, J., and Zhang, R. (2024). An indepth evaluation of federated learning on biomedical natural language processing for information extraction. npj Digital Medicine, 7(1):127. [link] DOI: 10.1038/s41746-024-01126-4

Tang, B., Cao, H., Wu, Y., Jiang, M., and Xu, H. (2013). Recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features. In BMC medical informatics and decision making, volume 13, pages 1–10. Springer [link] DOI: 10.1186/1472-6947-13-S1-S1

Yi, X., Paulet, R., Bertino, E., Yi, X., Paulet, R., and Bertino, E. (2014). Homomorphic encryption. Springer. [link] DOI: 10.1007/978-3-319-12229-8

Zhu, L., Liu, Z., and Han, S. (2019). Deep leakage from gradients. Advances in neural information processing systems, 32. [link] DOI: 10.48550/arXiv.1906.08935