Mitigating the Impact of Non-IID Distributions in Federated Learning for Recommendation Systems
Abstract
This work investigates the impact of non-independent and identically distributed data distributions (non-IID) on the performance of movies Recommender Systems (SR) using Federated Learning (FL). Through the collaboration between parts, FL allows the distribution of computational cost and expands the data diversity available to the training process, favoring the construction of SRs with good generalization capacity. In this context, and to analyze the effectiveness of the partial data sharing strategy, experiments were conducted in four scenarios using the dataset Movielens1M: (C1) IID data; (C2) non-sharing non-IID data; (C3) non-IID data with 1% sharing; and (C4) non-IID data with 5% sharing. Two federated paradigms were used: cross-silo and cross-device. Experimental results indicate that partial data sharing is a promising approach to mitigate the adverse effects of non-IID distributions in federated learning, incurring approximately 9,5% increases for NDCG and 0.046 drops in the MAE, for example. Thus, predictive performance, privacy, and computational cost are balanced.
References
Ali, W., Kumar, R., Deng, Z., Wang, Y., and Shao, J. (2021). A federated learning approach for privacy protection in context-aware recommender systems. The Computer Journal, 64(7):1016–1027.
Ammad-Ud-Din, M., Ivannikova, E., Khan, S. A., Oyomno, W., Fu, Q., Tan, K. E., and Flanagan, A. (2019). Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv preprint arXiv:1901.09888.
Arafeh, M., Hammoud, A., Otrok, H., Mourad, A., Talhi, C., and Dziong, Z. (2022). Independent and identically distributed (iid) data assessment in federated learning. In GLOBECOM 2022-2022 IEEE Global Communications Conference, pages 293–298. IEEE.
Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Parcollet, T., and Lane, N. D. (2020). Flower: A friendly federated learning research framework. CoRR, abs/2007.14390.
Christakou, C., Vrettos, S., and Stafylopatis, A. (2007). A hybrid movie recommender system based on neural networks. International Journal on Artificial Intelligence Tools, 16(05):771–792.
Fang, H. and Qian, Q. (2021). Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet, 13(4):94.
Fernandez, C. (2017). The paradox of choice: why more is less. Vikalpa, 42(4):265–267.
Huang, C., Huang, J., and Liu, X. (2022). Cross-silo federated learning: Challenges and opportunities. arXiv preprint arXiv:2206.12949.
Jagadish, H. V. (2015). Big data and science: Myths and reality. Big Data Research, 2(2):49–52.
Jimenez, G. D. M., Anagnostopoulos, A., Chatzigiannakis, I., and Vitaletti, A. (2024). Fedartml: A tool to facilitate the generation of non-iid datasets in a controlled way to support federated learning research. IEEE Access.
Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al. (2021). Advances and open problems in federated learning. Foundations and trends® in machine learning, 14(1–2):1–210.
Karimireddy, S. P., Jaggi, M., Kale, S., Mohri, M., Reddi, S., Stich, S. U., and Suresh, A. T. (2021). Breaking the centralized barrier for cross-device federated learning. Advances in Neural Information Processing Systems, 34:28663–28676.
Li, Q., Diao, Y., Chen, Q., and He, B. (2022). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th international conference on data engineering (ICDE), pages 965–978. IEEE.
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2020). Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450.
Li, W., Yin, Y., Quan, X., and Zhang, H. (2019). Gene expression value prediction based on xgboost algorithm. Frontiers in genetics, 10:1077.
Li, X., Fei, J., Xie, J., Li, D., Jiang, H., Wang, R., and Qi, Z. (2023a). Open set recognition for malware traffic via predictive uncertainty. Electronics, 12(2):323.
Li, X., Sun, L., Ling, M., and Peng, Y. (2023b). A survey of graph neural network based recommendation in social networks. Neurocomputing, 549:126441.
Lin, D., Guo, Y., Sun, H., and Chen, Y. (2022). Fedcluster: A federated learning framework for cross-device private ecg classification. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pages 1–6. IEEE.
Lü, L., Medo, M., Yeung, C. H., Zhang, Y.-C., Zhang, Z.-K., and Zhou, T. (2012). Recommender systems. Physics reports, 519(1):1–49.
McMahan, H. B., Moore, E., Ramage, D., and y Arcas, B. A. (2016). Federated learning of deep networks using model averaging. CoRR, abs/1602.05629.
Shahbazi, Z. and Byun, Y.-C. (2019). Product recommendation based on content-based filtering using xgboost classifier. Int. J. Adv. Sci. Technol, 29:6979–6988.
Sharma, L. and Gera, A. (2013). A survey of recommendation system: Research challenges. International Journal of Engineering Trends and Technology (IJETT), 4(5):1989–1992.
Shwartz-Ziv, R. and Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90.
ur Rehman, M. H., Dirir, A. M., Salah, K., Damiani, E., and Svetinovic, D. (2021). Trustfed: A framework for fair and trustworthy cross-device federated learning in iiot. IEEE Transactions on Industrial Informatics, 17(12):8485–8494.
Zhang, C., Li, S., Xia, J., Wang, W., Yan, F., and Liu, Y. (2020). {BatchCrypt}: Efficient homomorphic encryption for {Cross-Silo} federated learning. In 2020 USENIX annual technical conference (USENIX ATC 20), pages 493–506.
Zhang, L., Luo, T., Zhang, F., and Wu, Y. (2018). A recommendation model based on deep neural network. IEEE Access, 6:9454–9463.
Zhang, Y. (2022). Music recommendation system and recommendation model based on convolutional neural network. Mobile Information Systems, 2022(1):3387598.
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., and Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582.
Zhu, H., Xu, J., Liu, S., and Jin, Y. (2021). Federated learning on non-iid data: A survey. Neurocomputing, 465:371–390.
Zhu, S., Zeng, J., Wang, S., Sun, Y., Li, X., Yao, Y., and Peng, Z. (2024). On admm in heterogeneous federated learning: Personalization, robustness, and fairness. arXiv preprint arXiv:2407.16397.
