Federated Learning with Embedding Generation for Statistical Heterogeneity Control
Abstract
Federated Learning enables collaborative training of machine learning models without sharing local data, addressing growing concerns over data privacy. However, heterogeneous data distributions across clients remain a major challenge, often degrading model performance. In this paper, we propose FLEG, a novel approach that alternates classifier training with the training of a Conditional Generative Adversarial Network (CGAN) to augment client datasets, mitigating statistical heterogeneity and, consequently, improving classification performance. Unlike prior methods, FLEG generates synthetic embeddings instead of images, adding an extra layer of protection against data leakage. Experimental results show that FLEG outperforms the FedAvg baseline by up to 14 percentage points in validation accuracy on CIFAR-10 under the evaluated settings. The code is available at https://github.com/gustavoguaragna/FLEG.
References
Ahmed, N., Wahed, M., and Thompson, N. C. (2023). The growing influence of industry in ai research. Science, 379(6635):884–886.
Alzubaidi, L., Zhang, J., Humaidi, A. J., and et al. (2021). Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. Journal of Big Data, 8:53.
Capanema, C. G. S., de Souza, A. M., da Costa, J. B. D., Silva, F. A., Villas, L. A., and Loureiro, A. A. F. (2025). A novel prediction technique for federated learning. IEEE Transactions on Emerging Topics in Computing, 13(1):5–21.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.
Duan, M., Liu, D., Chen, X., Liu, R., Tan, Y., and Liang, L. (2020). Self-balancing federated learning with global imbalanced data in mobile systems. IEEE Transactions on Parallel and Distributed Systems, 32(1):59–71.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial networks.
Guaragna, G. S., Da Costa, J. B. D., and De Souza, A. M. (2025). Federated learning with iterative synthetic data augmentation.
Huangsuwan, K., Liu, T., See, S., Beng Ng, A., and Vateekul, P. (2025). Feddrip: Federated learning with diffusion-generated synthetic image. IEEE Access, 13:10111–10125.
Jeong, E., Oh, S., Kim, H., Park, J., Bennis, M., and Kim, S. (2018). Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. CoRR, abs/1811.11479.
Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S. J., Stich, S. U., and Suresh, A. T. (2021). Scaffold: Stochastic controlled averaging for federated learning.
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (2002). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.
Li, P., Zhang, H., Wu, Y., Qian, L., Yu, R., Niyato, D., and Shen, X. (2024a). Filling the missing: Exploring generative ai for enhanced federated learning over heterogeneous mobile edge devices. IEEE Transactions on Mobile Computing, 23(10):10001–10015.
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2020). Federated optimization in heterogeneous networks.
Li, Z., Shao, J., Mao, Y., Wang, J. H., and Zhang, J. (2022). Federated learning with gan-based data synthesis for non-iid clients.
Li, Z., Sun, Y., Shao, J., Mao, Y., Wang, J. H., and Zhang, J. (2024b). Feature matching data synthesis for non-iid federated learning. IEEE Transactions on Mobile Computing, 23(10):9352–9367.
Maciel, F., da Costa, J. B. D., Gonzalez, L. F. G., de Souza, A. M., Villas, L. A., and Bittencourt, L. F. (2024). Adaptive fit fraction based on model performance evolution in federated learning. In 2024 11th International Conference on Future Internet of Things and Cloud (FiCloud), pages 77–84.
Maliakel, P. J., Ilager, S., and Brandic, I. (2024). Fligan: Enhancing federated learning with incomplete data using gan.
McMahan, H. B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2016). Federated learning of deep networks using model averaging. CoRR, abs/1602.05629.
Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. Pan, H., Hong, Z., Durak, G., Xu, Z., and Bagci, U. (2025). Federated breast cancer detection enhanced by synthetic ultrasound image augmentation.
Pennisi, M., Salanitri, F. P., Bellitto, G., Casella, B., Aldinucci, M., Palazzo, S., and Spampinato, C. (2023). Feder: Federated learning through experience replay and privacy-preserving data synthesis.
Salvo, F. D., Nguyen, H. H. M., and Ledig, C. (2025). Embedding-based federated data sharing via differentially private conditional vaes.
Salvo, F. D., Tafler, D., Doerrich, S., and Ledig, C. (2024). Privacy-preserving datasets by capturing feature distributions with conditional vaes. In 35th British Machine Vision Conference 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024. BMVA.
Wu, Q., Chen, X., Zhou, Z., and Zhang, J. (2020). Fedhome: Cloud-edge based personalized federated learning for in-home health monitoring. IEEE Transactions on Mobile Computing, 21(8):2818–2832.
Yonetani, R., Takahashi, T., Hashimoto, A., and Ushiku, Y. (2019). Decentralized learning of generative adversarial networks from non-iid data.
Yoshida, N., Nishio, T., Morikura, M., Yamamoto, K., and Yonetani, R. (2020). Hybrid-fl for wireless networks: Cooperative learning mechanism using non-iid data.
Yu, S., Zhu, K., Liang, F., Wang, J., Kant, K., and Yin, L. (2026). Robust multimodal federated learning for non-iid multimodal data with incompleteness. Future Generation Computer Systems, 174:107948.
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., and Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582.
Zhao, Z., Yang, F., and Liang, G. (2023). Federated learning based on diffusion model to cope with non-iid data. In Pattern Recognition and Computer Vision: 6th Chinese Conference, PRCV 2023, Xiamen, China, October 13–15, 2023, Proceedings, Part IX, page 220–231, Berlin, Heidelberg. Springer-Verlag.
Zhu, H., Xu, J., Liu, S., and Jin, Y. (2021). Federated learning on non-iid data: A survey.
