Aprendizado Federado Incremental e Sensível ao Risco para Modelos de Ranqueamento em Cenários com Distribuições Heterogêneas de Dados

Gestefane Rabbi; Celso França; Daniel Xavier de Sousa; Thierson Couto Rosa; Jussara M. Almeida; Marcos André Gonçalves

doi:10.5753/sbbd.2025.246990

Gestefane Rabbi Universidade Federal de Minas Gerais (UFMG)
Celso França Universidade Federal de Minas Gerais (UFMG)
Daniel Xavier de Sousa Instituto Federal de Goiás (IFG)
Thierson Couto Rosa Universidade Federal de Goiás (UFG)
Jussara M. Almeida Universidade Federal de Minas Gerais (UFMG) https://orcid.org/0000-0001-9142-2919
Marcos André Gonçalves Universidade Federal de Minas Gerais (UFMG)

DOI: https://doi.org/10.5753/sbbd.2025.246990

Resumo

Este trabalho propõe uma nova estratégia de Aprendizado Federado para Ranqueamento (FL2R) em cenários com dados não independentes e não identicamente distribuídos (não-IID) entre clientes. Apresentamos o FedRisk, um método de agregação sensível ao risco que pondera as contribuições dos clientes conforme sua confiabilidade, aliado a um mecanismo de reutilização de parâmetros do modelo global anterior, para mitigar os efeitos da heterogeneidade dos dados. Experimentos com o conjunto MSLR-WEB10K mostram que o FedRisk supera o FedProx — baseline mais robusto — ao reduzir a diferença de desempenho entre modelos federados e centralizados. O FedRisk alcançou uma melhoria de 15.6% no nDCG@5 em relação ao FedProx e reduziu substancialmente a variância, aumentando a estabilidade entre rodadas. Além disso, para métricas como nDCG@10, o FedRisk igualou o desempenho do modelo centralizado — feito inédito entre os métodos comparados, sobretudo em um cenário federado não-IID.

Palavras-chave: FedRisk, Aprendizado Federado, Ranqueamento em Recuperação de Informação, Dados não-IID, Agregação sensível ao risco

Referências

Ads, Z. et al. (2024). Risk-aware accelerated federated learning over heterogeneous wireless networks. arXiv preprint arXiv:2401.09267.

Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Fernandez-Marques, J., Gao, Y., Sani, L., Kwing, H. L., Parcollet, T., Gusmão, P. P. d., and Lane, N. D. (2020). Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390.

Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer.

Brownlee, J. (2018). Statistical Methods for Machine Learning. Machine Learning Mastery.

Chen, S. et al. (2021). Risk-aware federated learning in crowdsensing systems. arXiv preprint arXiv:2101.01266.

Dincer, B., Zhu, Y., Craswell, N., and Zhang, M. (2016). Risk-sensitive evaluation and learning to rank using multiple baselines. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 483–492.

Divi, S., Lin, Y.-S., Farrukh, H., and Celik, Z. B. (2021). New metrics to evaluate the performance and fairness of personalized federated learning.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.

Hejazinia, M. et al. (2022). Fel: High capacity learning for recommendation and ranking via federated ensemble learning. arXiv preprint arXiv:2206.03852.

Järvelin, K. and Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422–446.

Jeong, J., Kim, H., Park, J., Lee, S., and Yoon, D. N. (2022). Fedcc: Boosting robustness of federated learning against model poisoning attacks. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 861–875. ACM.

Jiang, J. C., Kantarci, B., Oktug, S., and Soyata, T. (2020). Federated learning in smart city sensing: Challenges and opportunities. Sensors, 20(21):6230.

Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S. U., and Suresh, A. T. (2020). Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning (ICML).

Köppel, M., Segner, A., Wagener, M., Pensel, L., Karwath, A., and Kramer, S. (2019). Pairwise learning to rank by neural networks revisited: Reconstruction, theoretical analysis and practical performance. arXiv preprint arXiv:1909.02768.

Li, T., Sahu, A. K., Talwalkar, A., and Smith, V. (2020). Federated optimization in heterogeneous networks. In Proceedings of Machine Learning and Systems, pages 429–450.

Liu, S., Celik, E., and Widmer, J. (2021). Label-aware aggregation for improved federated learning. In Proceedings of the 2021 20th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pages 1–13. IEEE.

Neto, H. N. C., Mattos, D. M. F., and Fernandes, N. C. (2020). Privacidade do usuário em aprendizado colaborativo: Federated learning, da teoria à prática. In Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSEG).

Qin, T. and Liu, T. (2013). Introducing LETOR 4.0 datasets. CoRR, abs/1306.2597.

Rodrigues, P. H. S., de Sousa, D. X., França, C., Rabbi, G., Couto Rosa, T., and Gonçalves, M. A. (2025). Risk-sensitive optimization of neural deep learning ranking models with applications in ad-hoc retrieval and recommender systems. Information Processing & Management, 62(4):104126.

Rodrigues, P. H. S., Xavier Sousa, D., Couto Rosa, T., and Gonçalves, M. A. (2022). Risk-sensitive deep neural learning to rank. In ACM SIGIR Conference, SIGIR ’22, page 803–813.

Spiegelhalter, D. (2024). The Art of Uncertainty: How to Navigate Chance, Ignorance, Risk and Luck. Pelican Books.

Tong, Y. et al. (2021). An efficient approach for cross-silo federated learning to rank. In Proceedings of the IEEE International Conference on Data Engineering (ICDE).

Voorhees, E. M. (1999). The trec-8 question answering track report. In Proceedings of the Eighth Text Retrieval Conference (TREC-8). National Institute of Standards and Technology (NIST).

Voorhees, E. M. et al. (1999). The trec-8 question answering track report. In TREC, volume 8.

Wang, J. and Liu, M. (2020). Tackling the objective inconsistency problem in heterogeneous federated optimization. In NeurIPS.

Wang, L., Bennett, P. N., and Collins-Thompson, K. (2012). Robust ranking models via risk-sensitive optimization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, page 761–770, New York, NY, USA. Association for Computing Machinery.

Wang, S. and Zuccon, G. (2022). Is non-iid data a threat in federated online learning to rank? In ACM SIGIR Conference, SIGIR ’22, page 2801–2813.

Wang, Y., Li, T.-Y., Wang, D., and Zhu, M. (2013). A theoretical analysis of ndcg type ranking measures. Journal of Machine Learning Research, 14:25–54.

Zhao, S. et al. (2024). Federated risk-aware learning with central sensitivity estimation. arXiv preprint arXiv:2502.17694.