ResilienceBench-Operator: A Kubernetes Extension for Orchestrating Resilience Experiments on Microservice Applications
Abstract
Microservice-based applications often rely on resiliency patterns such as Retry and Circuit Breaker to mitigate the impact of failures in service-to-service communication. However, there is still limited tooling support for systematically and reproducibly evaluating the performance impact of these patterns in realistic deployment environments. This paper presents ResilienceBench-Operator, a Kubernetes native tool that orchestrates resilience experiments directly on microservice applications deployed in Kubernetes clusters. ResilienceBench-Operator is an evolution of the original ResilienceBench tool, which was focused on controlled environments with predefined client-server interactions. The new version introduces a fully declarative approach to define test spaces, automatically expands them into concrete test scenarios, injects failures, and configures resiliency strategies across real service dependencies. This paper describes the motivation, architecture, and main functionalities of ResilienceBench-Operator, and illustrates how it enables systematic evaluation of resiliency strategies in a representative microservice system. Demo video: https://youtu.be/ZSQcx6Ab37w.
References
Carlos Mendes Aderaldo and Nabor Das Chagas Mendonca. 2023. How The Retry Pattern Impacts Application Performance: A Controlled Experiment. In Proceedings of the XXXVII Brazilian Symposium on Software Engineering. 47–56.
Carlos M. Aderaldo and Nabor C. Mendonça. 2022. ResilienceBench: Um Ambiente para Avaliação Experimental de Padrões de Resiliência para Microsserviços. In Anais Estendidos do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (Fortaleza, CE). SBC, Porto Alegre, RS, Brasil, 65–72.
Amazon Web Services. 2025. AWS Fault Injection Service: Improve resilience and performance with controlled experiments. [link].
Brendan Burns, Joe Beda, Kelsey Hightower, and Lachlan Evenson. 2022. Kubernetes: Up and Running: Dive into the Future of Infrastructure. O’Reilly Media.
ChaosToolkit. 2024. The chaos engineering toolkit for developers. [link].
Thiago Costa, Davi Vasconcelos, Carlos Aderaldo, and Nabor Mendonça. 2022. Avaliação de Desempenho de Dois Padrões de Resiliência para Microsserviços: Retry e Circuit Breaker. In Anais do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (Fortaleza, CE). SBC, Porto Alegre, RS, Brasil, 517–530.
Jason Dobies and Joshua Wood. 2020. Kubernetes Operators: Automating the Container Orchestration Platform. O’Reilly Media.
Envoy. 2025. Envoy Proxy. [link].
Google Cloud Platform. 2024. Online Boutique. [link].
Grafana Labs. 2025. Grafana k6: Load testing for engineering teams . [link].
HashiCorp. 2025. Vagrant: Development Environments Made Easy. [link].
Sören Henning, Benedikt Wetzel, and Wilhelm Hasselbring. 2021. Reproducible Benchmarking of Cloud-Native Applications With the Kubernetes Operator Pattern. In Proceedings of Symposium on Software Performance. CEUR, Leipzig, Germany.
Victor Heorhiadi, Shriram Rajagopalan, Hani Jamjoom, Michael K Reiter, and Vyas Sekar. 2016. Gremlin: Systematic Resilience Testing of Microservices. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS). 57–66.
Istio.io. 2025. The Istio service mesh. [link].
Pooyan Jamshidi, Claus Pahl, Nabor C Mendonça, James Lewis, and Stefan Tilkov. 2018. Microservices: The Journey So Far and Challenges Ahead. IEEE Software 35, 3 (2018), 24–35.
Shanshan Li et al. 2021. Understanding and addressing quality attributes of microservices architecture: A Systematic literature review. Information and Software Technology 131 (2021), 106449.
Nabor C. Mendonca, Carlos Mendes Aderaldo, Javier Cámara, and David Garlan. 2020. Model-based analysis of microservice resiliency patterns. In 2020 IEEE International Conference on Software Architecture (ICSA). IEEE, 114–124.
Microsoft. 2022. Polly. [link].
Resilience4j. 2022. Resilience4j: A Fault tolerance library designed for functional programming. [link].
Casey Rosenthal, Lorin Hochstein, Aaron Blohowiak, Nora Jones, and Ali Basiri. 2017. Chaos Engineering: Building Confidence in System Behavior through Experiments. O’Reilly.
Mohammad Reza Saleh Sedghpour, David Garlan, Bradley Schmerl, Cristian Klein, and Johan Tordsson. 2023. Breaking the Vicious Circle: Self-Adaptive Microservice Circuit Breaking and Retry. In 2023 IEEE international conference on cloud engineering (IC2E). IEEE, 32–42.
Andre Van Hoorn, Aldeida Aleti, Thomas F Düllmann, and Teerat Pitakrat. 2018. ORCAS: Efficient Resilience Benchmarking of Microservice Architectures. In 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 146–147.
Tianyi Yang, Cheryl Lee, Jiacheng Shen, Yuxin Su, Cong Feng, Yongqiang Yang, and Michael R Lyu. 2024. MicroRes: Versatile Resilience Profiling in Microservices via Degradation Dissemination Indexing. In 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 325–337.
