Leveraging Large Language Models for Anomaly Detection in Microservices Architectures

  • Diego Frazatto Pedroso USP
  • Luís Almeida University of Porto
  • William Akihiro Alves Aisawa USP
  • Inês Dutra University of Porto
  • Sarita Mazzini Bruschi USP

Resumo


Cloud computing has become a key enabler of scalable and high-performance applications, allowing systems to be deployed rapidly. At the same time, the increasing sophistication of cloud-native environments brings new challenges related to system dependability. Ensuring resilience under such conditions is a fundamental responsibility of IT providers, who must safeguard service continuity and operational stability. The widespread use of microservice-based designs has created an ecosystem with a growing number of interacting components, including frameworks, application layers, hypervisors, and orchestration platforms. This distributed and layered environment produces a massive volume of log data originating from heterogeneous sources. Without automated support, extracting useful insights from these logs becomes a highly complex task. One promising direction to mitigate this challenge is the use of Machine Learning, particularly methods grounded in Large Language Models (LLMs), which can dynamically detect recurring structures and anomalies in event streams. Building on this idea, our work introduces an anomaly detection framework deployed within a microservices environment running on Kubernetes with Istio. The framework integrates an LLM trained on a diverse set of fault scenarios. To create these scenarios, we relied on Chaos Mesh for fault injection and Locust for workload stress testing. The evaluation confirmed that the model achieved high accuracy in identifying anomalies. It consistently detected all injected faults, although a small number of false positives were observed. Importantly, these false alarms remained at acceptable levels, highlighting the approach’s practical applicability.
Palavras-chave: Cloud computing, Virtual machine monitors, Large language models, Microservice architectures, Computer architecture, Stability analysis, Anomaly detection, Stress, Testing, Resilience, LLM, Anomaly Detection, Microservices, AWS
Publicado
28/10/2025
PEDROSO, Diego Frazatto; ALMEIDA, Luís; AISAWA, William Akihiro Alves; DUTRA, Inês; BRUSCHI, Sarita Mazzini. Leveraging Large Language Models for Anomaly Detection in Microservices Architectures. In: WORKSHOP ON CLOUD COMPUTING (WCC) - INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 37. , 2025, Bonito/MS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 92-99.