FedSeleKDistill: Empoderando a Escolha de Clientes com a Destilação do Conhecimento para Aprendizado Federado em Dados Não-IID

Aissa H. Mohamed; Allan M. de Souza; Joahannes B. D. da Costa; Leandro A. Villas; Julio C. Dos Reis

doi:10.5753/courb.2024.3238

Aissa H. Mohamed UNICAMP
Allan M. de Souza UNICAMP
Joahannes B. D. da Costa UNICAMP
Leandro A. Villas UNICAMP
Julio C. Dos Reis UNICAMP

DOI: https://doi.org/10.5753/courb.2024.3238

Resumo

FL é uma abordagem distribuída na qual múltiplos dispositivos colaboram para treinar um modelo global compartilhado. Durante seu treinamento, os dispositivos clientes devem comunicar seus gradientes ao servidor central para atualizar os pesos do modelo global. Isso acarreta custos significativos de comunicação (utilização de largura de banda e o número de mensagens trocadas). A natureza heterogênea dos conjuntos de dados dos clientes representa um desafio adicional. Nesse sentido, introduzimos o FedSeleKDistill, Federated Selection and Knowledge Distillation Algorithm, para reduzir os custos de comunicação globais. O FedSeleKDistill é uma combinação inovadora de: (i) seleção de clientes e (ii) abordagens de destilação de conhecimento com três objetivos principais: (i) reduzir o número de dispositivos treinando em cada rodada; (ii) diminuir o número de rodadas para atingir a convergência; e (iii) mitigar o efeito dos dados heterogêneos do cliente na eficácia do modelo global. Nossas avaliações experimentais em conjunto de dados MNIST demonstram que o FedSeleKDistill é altamente eficiente no treinamento do modelo global até a convergência. O FedSeleKDistill alcança uma pontuação de precisão mais alta e uma convergência mais rápida do que os modelos baseados em estado da arte. Nossos resultados também mostram um desempenho superior ao analisar as pontuações de precisão nos conjuntos de dados locais do cliente.

Referências

Amiri, M. M., Gunduz, D., Kulkarni, S. R., and Poor, H. V. (2020). Federated learning with quantized global model updates. arXiv preprint arXiv:2006.10672.

Bernstein, J., Wang, Y.-X., Azizzadenesheli, K., and Anandkumar, A. (2018). signsgd: Compressed optimisation for non-convex problems. In International Conference on Machine Learning, pages 560–569. PMLR.

Bucila, C., Caruana, R., and Niculescu-Mizil, A. (2006). Model compression.

Cho, Y. J., Wang, J., and Joshi, G. (2020). Client selection in federated learning: Convergence analysis and power-of-choice selection strategies. arXiv preprint arXiv:2010.01243.

de Souza, A. M., Bittencourt, L. F., Cerqueira, E., Loureiro, A. A., and Villas, L. A. (2023). Dispositivos, eu escolho vocês: Seleção de clientes adaptativa para comunicação eficiente em aprendizado federado. In Anais do XLI Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pages 1–14. SBC.

de Souza, A. M., Maciel, F., da Costa, J. B., Bittencourt, L. F., Cerqueira, E., Loureiro, A. A., and Villas, L. A. (2024). Adaptive client selection with personalization for communication efficient federated learning. Ad Hoc Networks, page 103462.

He, Y., Chen, Y., Yang, X., Yu, H., Huang, Y.-H., and Gu, Y. (2022a). Learning critically: Selective self-distillation in federated learning on non-iid data. IEEE Transactions on Big Data, PP:1–12.

He, Y., Chen, Y., Yang, X., Zhang, Y., and Zeng, B. (2022b). Class-wise adaptive self distillation for federated learning on non-iid data (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12967–12968.

Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network.

Kaggle. Human activity recognition with smartphones.

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.

Lee, G., Jeong, M., Shin, Y., Bae, S., and Yun, S.-Y. (2022). Preservation of the global knowledge by not-true distillation in federated learning. Advances in Neural Information Processing Systems, 35:38461–38474.

Li, T., Sahu, A. K., Talwalkar, A., and Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60.

Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2018). Federated optimization in heterogeneous networks. CoRR, abs/1812.06127.

Lim, W. Y. B., Luong, N. C., Hoang, D. T., Jiao, Y., Liang, Y.-C., Yang, Q., Niyato, D., and Miao, C. (2020). Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials, 22(3):2031–2063.

Lin, T., Kong, L., Stich, S. U., and Jaggi, M. (2020). Ensemble distillation for robust model fusion in federated learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 2351–2363. Curran Associates, Inc.

Lin, Y., Han, S., Mao, H., Wang, Y., and Dally, W. J. (2017). Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887.

McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR.

Mohamed, A. H., Assumpçáo, N. R., Astudillo, C. A., de Souza, A. M., Bittencourt, L. F., and Villas, L. A. (2023a). Compressed client selection for efficient communication in federated learning. In 2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), pages 508–516. IEEE.

Mohamed, A. H., de Souza, A. M., da Costa, J. B. D., Villas, L., and dos Reis, J. C. (2023b). Ccsf: Clustered client selection framework for federated learning in non-iid data. In Proceedings of the 16th IEEE/ACM Utility and Cloud Computing Conference (UCC), UCC ’23, New York, NY, USA. Association for Computing Machinery.

Mora, A., Tenison, I., Bellavista, P., and Rish, I. (2022). Knowledge distillation for federated learning: a practical guide. arXiv preprint arXiv:2211.04742.

Mothukuri, V., Parizi, R. M., Pouriyeh, S., Huang, Y., Dehghantanha, A., and Srivastava, G. (2021). A survey on security and privacy of federated learning. Future Generation Computer Systems, 115:619–640.

Nguyen, H. T., Sehwag, V., Hosseinalipour, S., Brinton, C. G., Chiang, M., and Poor, H. V. (2020). Fast-convergent federated learning. IEEE Journal on Selected Areas in Communications, 39(1):201–218.

Rothchild, D., Panda, A., Ullah, E., Ivkin, N., Stoica, I., Braverman, V., Gonzalez, J., and Arora, R. (2020). Fetchsgd: Communication-efficient federated learning with sketching. In International Conference on Machine Learning, pages 8253–8265. PMLR.

Sattler, F., Wiedemann, S., Müller, K.-R., and Samek, W. (2020). Robust and communication-efficient federated learning from non-i.i.d. data. IEEE Transactions on Neural Networks and Learning Systems, 31(9):3400–3413.

Shahid, O., Pouriyeh, S., Parizi, R. M., Sheng, Q. Z., Srivastava, G., and Zhao, L. (2021). Communication efficiency in federated learning: Achievements and challenges. arXiv preprint arXiv:2107.10996.

Ström, N. (2015). Scalable distributed dnn training using commodity gpu cloud computing.

Wang, H., Kaplan, Z., Niu, D., and Li, B. (2020). Optimizing federated learning on non-iid data with reinforcement learning. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pages 1698–1707. IEEE.

Yao, D., Pan, W., Dai, Y., Wan, Y., Ding, X., Jin, H., Xu, Z., and Sun, L. (2021). Local-global knowledge distillation in heterogeneous federated learning with non-iid data. arXiv preprint arXiv:2107.00051.

Zhang, L., Shen, L., Ding, L., Tao, D., and Duan, L.-Y. (2022). Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10164–10173.