Evaluating Federated Learning with Homomorphic Encryption for Medical Named Entity Recognition Using Compact BERT Models
Resumo
Medical Named Entity Recognition (NER) identifies and categorizes medical entities from unstructured texts, crucial for health monitoring tasks. Despite advancements with Large Language Models (LLMs), medical NER faces challenges due to limited and dispersed labeled data across institutions, protected under privacy regulations. Federated Learning (FL) offers a solution by enabling decentralized model training while preserving data privacy, but it is vulnerable to byzantine attacks. This research proposes a simple and secure FL protocol using Homomorphic Encryption (HE), called FedHE, that removes the need of trust between the federations and the training coordinator. Encrypted FL imposes significant constraints regarding resources consumption and performance, making the state-of-the-art language models impractical. This research aims to assess how well compact BERT representations work in federated medical NER tasks in comparison to the state-of-the-art approaches. The results showed that compact BERT representations, such as BERTmini are competitive with the state-of-the-art, and are feasible to use in FedHE. However, resource consumption overheads remain a challenge, particularly when the number of clients increase.
Referências
Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Fernandez-Marques, J., Gao, Y., Sani, L., Li, K. H., Parcollet, T., de Gusmao, P. P. B., et al. (2020). Flower: A friendly federated ˜ learning research framework. arXiv preprint arXiv:2007.14390. [link] DOI: 10.48550/arXiv.2007.14390
Konečný, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. (2016). Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. [link] DOI: 10.48550/arXiv.1610.05492
Marcolla, C., Sucasas, V., Manzano, M., Bassoli, R., Fitzek, F. H., and Aaraj, N. (2022). Survey on fully homomorphic encryption, theory, and applications. Proceedings of the IEEE, 110(10):1572–1609. [link] DOI: 10.1109/JPROC.2022.3205665
Peng, L., Luo, G., Zhou, S., Chen, J., Xu, Z., Sun, J., and Zhang, R. (2024). An indepth evaluation of federated learning on biomedical natural language processing for information extraction. npj Digital Medicine, 7(1):127. [link] DOI: 10.1038/s41746-024-01126-4
Tang, B., Cao, H., Wu, Y., Jiang, M., and Xu, H. (2013). Recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features. In BMC medical informatics and decision making, volume 13, pages 1–10. Springer [link] DOI: 10.1186/1472-6947-13-S1-S1
Yi, X., Paulet, R., Bertino, E., Yi, X., Paulet, R., and Bertino, E. (2014). Homomorphic encryption. Springer. [link] DOI: 10.1007/978-3-319-12229-8
Zhu, L., Liu, Z., and Han, S. (2019). Deep leakage from gradients. Advances in neural information processing systems, 32. [link] DOI: 10.48550/arXiv.1906.08935