FedDALE: Detecção de Anomalias Eficiente em Logs Distribuídos com Pequenos Modelos de Linguagem Federados

Gabriel U. Talasso; Allan M. de Souza; Leandro A. Villas

doi:10.5753/wiarc.2026.22936

Gabriel U. Talasso UNICAMP
Allan M. de Souza UNICAMP
Leandro A. Villas UNICAMP

DOI: https://doi.org/10.5753/wiarc.2026.22936

Resumo

Com a crescente ubiquidade de sistemas distribuídos, falhas e ataques tornam-se cada vez mais frequentes, motivando o desenvolvimento de técnicas para mitigar essas vulnerabilidades. A análise de logs é uma abordagem promissora para detecção de anomalias, mas apresenta desafios importantes, relacionados a escala, natureza sequencial e textual dos dados, distribuição entre múltiplos dispositivos e informações sensíveis. Adicionalmente, dispositivos de baixo recurso existentes na redes são prejudicados pelos altos custos de treinamento, transmissão e uso de soluções atuais. Neste trabalho, apresentamos o FedDALE, um método federado e eficiente em comunicação e computação para detecção não-supervisionada de anomalias em logs utilizando pequenos modelos de linguagem (SLMs). A abordagem combina o (i) treinamento local não-supervisionado de modelos com ajuste fino eficiente, (ii) transferência de conhecimento federada baseada em predição e filtragem de dados, e (iii) treinamento de um modelo estudante menor e mais eficiente que agrega o conhecimento dos professores distribuídos. Experimentos, com implementação disponível1, mostram que o FedDALE atinge desempenho de detecção comparável a grandes modelos federados (F1 superior a 90%), enquanto reduz os custos de comunicação em até 82% e a latência de inferência em 71%.

Referências

Allal, L. B., Lozhkov, A., Bakouch, E., von Werra, L., and Wolf, T. (2024). Smollm blazingly fast and remarkably powerful. Almodovar, C., Sabrina, F., Karimi, S., and Azad, S. (2024). Logfit: Log anomaly detection using fine-tuned language models. IEEE Transactions on Network and Service Management.

De Oliveira Jarczewski, R., Cerqueira, E., Bittencourt, L. F., A. F. Loureiro, A., A. Villas, L., and De Souza, A. M. (2026). Participation is power: Effective approach to dynamic federated learning. In Proceedings of the 18th IEEE/ACM International Conference on Utility and Cloud Computing, UCC ’25, New York, NY, USA. Association for Computing Machinery.

Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Guan, W., Cao, J., Qian, S., and Gao, J. (2024). Logllm: Log-based anomaly detection using large language models. arXiv preprint arXiv:2411.08561.

Guha, N., Talwalkar, A., and Smith, V. (2019). One-shot federated learning. arXiv preprint arXiv:1902.11175.

Guo, H., Yuan, S., and Wu, X. (2021). Logbert: Log anomaly detection via bert. In 2021 international joint conference on neural networks (IJCNN), pages 1–8. IEEE.

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. CoRR, abs/2106.09685.

Huong, T. T., Bac, T. P., Ha, K. N., Hoang, N. V., Hoang, N. X., Hung, N. T., and Tran, K. P. (2022). Federated learning-based explainable anomaly detection for industrial control systems. IEEE Access, 10:53854–53872.

Kulkarni, V., Kulkarni, M., and Pant, A. (2020). Survey of personalization techniques for federated learning. In 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), pages 794–797.

Li, B., Ma, S., Deng, R., Choo, K.-K. R., and Yang, J. (2022). Federated anomaly detection on system logs for the internet of things: A customizable and communication-efficient approach. IEEE Transactions on Network and Service Management, 19(2):1705–1716.

Li, Q., He, B., and Song, D. (2020a). Practical one-shot federated learning for cross-silo setting. arXiv preprint arXiv:2010.01017.

Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2020b). Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450.

Liu, Y. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 364.

McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. Pmlr.

Pang, G., Shen, C., Cao, L., and Hengel, A. V. D. (2021). Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2):1–38.

Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., and Talwar, K. (2016). Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755.

Qi, J., Huang, S., Luan, Z., Yang, S., Fung, C., Yang, H., Qian, D., Shang, J., Xiao, Z., and Wu, Z. (2023). Loggpt: Exploring chatgpt for log-based anomaly detection. In 2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pages 273–280. IEEE.

Qin, L., Zhu, T., Zhou, W., and Yu, P. S. (2025). Knowledge distillation in federated learning: A survey on long lasting challenges and new solutions. International Journal of Intelligent Systems, 2025(1):7406934.

Souza, A., Bittencourt, L., Cerqueira, E., Loureiro, A., and Villas, L. (2023). Dispositivos, eu escolho vocês: Seleção de clientes adaptativa para comunicação eficiente em aprendizado federado. In Anais do XLI Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pages 1–14, Porto Alegre, RS, Brasil. SBC.

Talasso, G., de Souza, A., Guidoni, D., Cerqueira, E., and Villas, L. (2025). Fine-tuning eficiente de modelos de linguagem para detectar anomalias em logs privados usando aprendizado federado. In Anais do XLIII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pages 126–139, Porto Alegre, RS, Brasil. SBC.

Talasso, G. U., de Souza, A. M., Bittencourt, L. F., Cerqueira, E., Loureiro, A. A. F., and Villas, L. A. (2024). Fedsccs: Hierarchical clustering with multiple models for federated learning. In ICC 2024 - IEEE International Conference on Communications, pages 3280–3285.

Talasso, G. U., Kurmanji, M., de Souza, A. M., Lane, N. D., and Villas, L. A. (2026). Task-centric personalized federated fine-tuning of language models. arXiv preprint arXiv:2604.00050.

Wang, F., Zhang, Z., Zhang, X., Wu, Z., Mo, T., Lu, Q., Wang, W., Li, R., Xu, J., Tang, X., et al. (2024). A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and trustworthiness. arXiv preprint arXiv:2411.03350.

Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I. (2009). Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, page 117–132, New York, NY, USA. Association for Computing Machinery.

Ye, R., Wang, W., Chai, J., Li, D., Li, Z., Xu, Y., Du, Y., Wang, Y., and Chen, S. (2024). Openfedllm: Training large language models on decentralized private data via federated learning.

Zhang, T., Gao, L., He, C., Zhang, M., Krishnamachari, B., and Avestimehr, A. S. (2022). Federated learning for the internet of things: Applications, challenges, and opportunities. IEEE Internet of Things Magazine, 5(1):24–29.

Zhu, J., He, S., He, P., Liu, J., and Lyu, M. R. (2023). Loghub: A large collection of system log datasets for ai-driven log analytics. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pages 355–366. IEEE.

Zhu, Z., Hong, J., and Zhou, J. (2021). Data-free knowledge distillation for heterogeneous federated learning. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12878–12889. PMLR.