An Experimental Framework for Studying Non-IID Data in Federated Learning for Network Telemetry
Resumo
The increasing complexity of emerging 5G and 6G network environments has intensified the need for data-driven automation under heterogeneous and dynamic conditions. Federated Learning (FL) is a promising paradigm in this context. This paper presents an experimental framework to generate realistic Non-Independent and Identically Distributed (Non-IID) datasets through controlled execution of a distributed service and telemetry collection, aiming to improve the applicability of FL in network automation. Using Apache Cassandra as a representative cloud-native application, we construct datasets exhibiting temporal and structural heterogeneity. We statistically characterize these datasets and evaluate their impact on regression models and horizontal federated learning using a Wide & Deep architecture. Results show that while horizontal federation improves generalization compared to direct cross-dataset transfer, its performance degrades under pronounced structural Non-IID conditions, highlighting both its potential and limitations.Referências
Arachchige, T. K., Ickin, S., Abghari, S., and Boeva, V. (2024). Clients Behavior Monitoring in Federated Learning via Eccentricity Analysis. In 2024 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS), pages 1–8, Madrid, Spain. IEEE.
Bruce, P., Bruce, A., and Gedeck, P. (2020). Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python. O’Reilly Media, Inc., Sebastopol, CA, 2 edition.
Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., Anil, R., Haque, Z., Hong, L., Jain, V., Liu, X., and Shah, H. (2016). Wide & Deep Learning for Recommender Systems. arXiv:1606.07792 [cs].
Coronado, E., Behravesh, R., Subramanya, T., Fernandez-Fernandez, A., Siddiqui, M. S., Costa-Perez, X., and Riggio, R. (2022). Zero Touch Management: A Survey of Network Automation Solutions for 5G and 6G Networks. IEEE Communications Surveys & Tutorials, 24(4):2535–2578.
Criado, M. F., Casado, F. E., Iglesias, R., Regueiro, C. V., and Barro, S. (2022). Non-IID data and Continual Learning processes in Federated Learning: A long road ahead. Information Fusion, 88:263–280.
Dodge, Y. (2008). Kolmogorov–Smirnov Test, pages 283–287. Springer New York, New York, NY.
Ickin, S. (2023). Automated Feature Selection with Local Gradient Trajectory in Split Learning. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pages 1–7, Miami, FL, USA. IEEE.
Ickin, S., Fiedler, M., and Vandikas, K. (2021). QoE Modeling on Split Features with Distributed Deep Learning. Network, 1(2):165–190.
Jawad, A. T., Maaloul, R., and Chaari, L. (2023). A comprehensive survey on 6G and beyond: Enabling technologies, opportunities of machine learning and challenges. Computer Networks, 237:110085.
Kim, M., Lee, S., and Kim, J. (2020). A Wide & Deep Learning Sharing Input Data for Regression Analysis. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pages 8–12, Busan, Korea (South). IEEE.
Leites, J., Cerqueira, V., and Soares, C. (2024). Lag Selection for Univariate Time Series Forecasting using Deep Learning: An Empirical Study. arXiv:2405.11237 [stat].
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2018). Federated Optimization in Heterogeneous Networks. arXiv:1812.06127 [cs].
Lu, Z., Pan, H., Dai, Y., Si, X., and Zhang, Y. (2024). Federated Learning With Non-IID Data: A Survey. IEEE Internet of Things Journal, 11(11):19188–19209.
Maduranga, M. W. P., Tilwari, V., Rathnayake, R. M. M. R., and Sandamini, C. (2024). AI-Enabled 6G Internet of Things: Opportunities, Key Technologies, Challenges, and Future Directions. Telecom, 5(3):804–822.
McMahan, B., Moore, E., Ramage, D., Hampson, S., and Arcas, B. A. y. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. In Singh, A. and Zhu, J., editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 1273–1282. PMLR.
Pölsterl, S., Sarasua, I., Gutiérrez-Becker, B., and Wachinger, C. (2020). A Wide and Deep Neural Network for Survival Analysis from Anatomical Shape and Tabular Clinical Data. In Cellier, P. and Driessens, K., editors, Machine Learning and Knowledge Discovery in Databases, volume 1167, pages 453–464. Springer International Publishing, Cham. Series Title: Communications in Computer and Information Science.
Stadler, R., Pasquini, R., and Fodor, V. (2017). Learning from Network Device Statistics. Journal of Network and Systems Management, 25(4):672–698.
Surakhi, O., Zaidan, M. A., Fung, P. L., Hossein Motlagh, N., Serhan, S., AlKhanafseh, M., Ghoniem, R. M., and Hussein, T. (2021). Time-Lag Selection for Time-Series Forecasting Using Neural Network and Heuristic Algorithm. Electronics, 10(20):2518.
Tang, D., Yang, N., Li, Y., Zhu, Z., Jin, Z., and Yuan, D. (2025). Optimal Look-back Horizon for Time Series Forecasting in Federated Learning. arXiv:2511.12791 [cs].
Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019). Federated Machine Learning: Concept and Applications. arXiv:1902.04885 [cs].
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., and Chandra, V. (2018). Federated learning with non-iid data.
Zhao, Z., Feng, C., Hong, W., Jiang, J., Jia, C., Quek, T. Q. S., and Peng, M. (2022). Federated Learning With Non-IID Data in Wireless Networks. IEEE Transactions on Wireless Communications, 21(3):1927–1942.
Zhao, Z., Wang, J., Hong, W., Quek, T. Q. S., Ding, Z., and Peng, M. (2024). Ensemble Federated Learning With Non-IID Data in Wireless Networks. IEEE Transactions on Wireless Communications, 23(4):3557–3571.
Bruce, P., Bruce, A., and Gedeck, P. (2020). Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python. O’Reilly Media, Inc., Sebastopol, CA, 2 edition.
Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., Anil, R., Haque, Z., Hong, L., Jain, V., Liu, X., and Shah, H. (2016). Wide & Deep Learning for Recommender Systems. arXiv:1606.07792 [cs].
Coronado, E., Behravesh, R., Subramanya, T., Fernandez-Fernandez, A., Siddiqui, M. S., Costa-Perez, X., and Riggio, R. (2022). Zero Touch Management: A Survey of Network Automation Solutions for 5G and 6G Networks. IEEE Communications Surveys & Tutorials, 24(4):2535–2578.
Criado, M. F., Casado, F. E., Iglesias, R., Regueiro, C. V., and Barro, S. (2022). Non-IID data and Continual Learning processes in Federated Learning: A long road ahead. Information Fusion, 88:263–280.
Dodge, Y. (2008). Kolmogorov–Smirnov Test, pages 283–287. Springer New York, New York, NY.
Ickin, S. (2023). Automated Feature Selection with Local Gradient Trajectory in Split Learning. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pages 1–7, Miami, FL, USA. IEEE.
Ickin, S., Fiedler, M., and Vandikas, K. (2021). QoE Modeling on Split Features with Distributed Deep Learning. Network, 1(2):165–190.
Jawad, A. T., Maaloul, R., and Chaari, L. (2023). A comprehensive survey on 6G and beyond: Enabling technologies, opportunities of machine learning and challenges. Computer Networks, 237:110085.
Kim, M., Lee, S., and Kim, J. (2020). A Wide & Deep Learning Sharing Input Data for Regression Analysis. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pages 8–12, Busan, Korea (South). IEEE.
Leites, J., Cerqueira, V., and Soares, C. (2024). Lag Selection for Univariate Time Series Forecasting using Deep Learning: An Empirical Study. arXiv:2405.11237 [stat].
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2018). Federated Optimization in Heterogeneous Networks. arXiv:1812.06127 [cs].
Lu, Z., Pan, H., Dai, Y., Si, X., and Zhang, Y. (2024). Federated Learning With Non-IID Data: A Survey. IEEE Internet of Things Journal, 11(11):19188–19209.
Maduranga, M. W. P., Tilwari, V., Rathnayake, R. M. M. R., and Sandamini, C. (2024). AI-Enabled 6G Internet of Things: Opportunities, Key Technologies, Challenges, and Future Directions. Telecom, 5(3):804–822.
McMahan, B., Moore, E., Ramage, D., Hampson, S., and Arcas, B. A. y. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. In Singh, A. and Zhu, J., editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 1273–1282. PMLR.
Pölsterl, S., Sarasua, I., Gutiérrez-Becker, B., and Wachinger, C. (2020). A Wide and Deep Neural Network for Survival Analysis from Anatomical Shape and Tabular Clinical Data. In Cellier, P. and Driessens, K., editors, Machine Learning and Knowledge Discovery in Databases, volume 1167, pages 453–464. Springer International Publishing, Cham. Series Title: Communications in Computer and Information Science.
Stadler, R., Pasquini, R., and Fodor, V. (2017). Learning from Network Device Statistics. Journal of Network and Systems Management, 25(4):672–698.
Surakhi, O., Zaidan, M. A., Fung, P. L., Hossein Motlagh, N., Serhan, S., AlKhanafseh, M., Ghoniem, R. M., and Hussein, T. (2021). Time-Lag Selection for Time-Series Forecasting Using Neural Network and Heuristic Algorithm. Electronics, 10(20):2518.
Tang, D., Yang, N., Li, Y., Zhu, Z., Jin, Z., and Yuan, D. (2025). Optimal Look-back Horizon for Time Series Forecasting in Federated Learning. arXiv:2511.12791 [cs].
Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019). Federated Machine Learning: Concept and Applications. arXiv:1902.04885 [cs].
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., and Chandra, V. (2018). Federated learning with non-iid data.
Zhao, Z., Feng, C., Hong, W., Jiang, J., Jia, C., Quek, T. Q. S., and Peng, M. (2022). Federated Learning With Non-IID Data in Wireless Networks. IEEE Transactions on Wireless Communications, 21(3):1927–1942.
Zhao, Z., Wang, J., Hong, W., Quek, T. Q. S., Ding, Z., and Peng, M. (2024). Ensemble Federated Learning With Non-IID Data in Wireless Networks. IEEE Transactions on Wireless Communications, 23(4):3557–3571.
Publicado
25/05/2026
Como Citar
RIBEIRO, Johny M. B.; VANDIKAS, Konstantinos; MARQUEZINI, Maria Valéria; ROTHENBERG, Christian Esteve; PASQUINI, Rafael.
An Experimental Framework for Studying Non-IID Data in Federated Learning for Network Telemetry. In: SIMPÓSIO BRASILEIRO DE REDES DE COMPUTADORES E SISTEMAS DISTRIBUÍDOS (SBRC), 44. , 2026, Praia do Forte/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 127-140.
ISSN 2177-9384.
DOI: https://doi.org/10.5753/sbrc.2026.19206.
