skip to main content
10.1145/3613372.3613409acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

How The Retry Pattern Impacts Application Performance: A Controlled Experiment

Published:25 September 2023Publication History

ABSTRACT

Distributed application developers typically use resiliency patterns like Retry, Circuit Breaker, and Fail Fast for handling remote service failures. However, limited research exists on how these patterns may impact performance across various operational conditions. This paper presents a controlled experiment assessing the performance of over 100 Retry pattern configurations in Java and C# using Resilience4j and Polly libraries, under different workloads and failure rates. Our experimental results indicate increasing any of the three Retry parameters investigated (i.e., the initial backoff delay, the backoff delay multiplier, and the maximum number of retries) reduces response time but raises execution time, with effects intensifying exponentially as failure rates grow. An analysis using a state-of-the-art model explainer reveals the initial backoff delay’s impact is twice that of other parameters at low to moderate failure rates, with more balanced effects at high rates. These findings apply to both Resilience4j and Polly, with Polly’s impact being slightly higher due to subtle implementation differences. Our results can benefit both distributed application developers and researchers. Developers can learn from our findings to tailor the Retry pattern to their applications’ needs. Researchers can expand upon our work to enhance our collective understanding of resiliency patterns’ impact and implications.

References

  1. Carlos M. Aderaldo and Nabor C. Mendonça. 2022. ResilienceBench: Um Ambiente para Avaliação Experimental de Padrões de Resiliência para Microsserviços. In Anais Estendidos do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (Fortaleza, CE). SBC, Porto Alegre, RS, Brasil, 65–72.Google ScholarGoogle Scholar
  2. Gibeon Aquino, Rafael Queiroz, Geoff Merrett, and Bashir Al-Hashimi. 2019. The circuit breaker pattern targeted to future iot applications. In International Conference on Service-Oriented Computing. Springer, 390–396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. 2016. Site Reliability Engineering: How Google Runs Production Systems. O’Reilly.Google ScholarGoogle Scholar
  4. Alessandro Birolini. 2013. Reliability Engineering: Theory and Practice. Springer Science & Business Media.Google ScholarGoogle Scholar
  5. Steve Bourne. 2004. A Conversation with Bruce Lindsay: Designing for Failure May Be the Key to Success. ACM Queue 2, 8 (2004), 22–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Marc Brooker. 2015. Exponential Backoff And Jitter. AWS Architecture Blog, https://aws.amazon.com/pt/blogs/architecture/exponential-backoff-and-jitter/.Google ScholarGoogle Scholar
  7. Franz Brosch, Barbora Buhnova, Heiko Koziolek, and Ralf Reussner. 2011. Reliability Prediction for Fault-Tolerant Software Architectures. In Joint ACM SIGSOFT Conference and ACM SIGSOFT Symposium on Quality of Software Architectures (QoSA) and Architecting Critical Systems (ISARCS). 75–84.Google ScholarGoogle Scholar
  8. Franz Brosch, Heiko Koziolek, Barbora Buhnova, and Ralf Reussner. 2011. Architecture-Based Reliability Prediction with the Palladio Component Model. IEEE Transactions on Software Engineering 38, 6 (2011), 1319–1339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Giuliano Casale, Ningfang Mi, Ludmila Cherkasova, and Evgenia Smirni. 2012. Dealing with Burstiness in Multi-Tier Applications: Models and Their Parameterization. IEEE Transactions on Software Engineering 38, 5 (2012), 1040–1053.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thiago Costa, Davi Vasconcelos, Carlos Aderaldo, and Nabor Mendonça. 2022. Avaliação de Desempenho de Dois Padrões de Resiliência para Microsserviços: Retry e Circuit Breaker. In Anais do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (Fortaleza, CE). SBC, Porto Alegre, RS, Brasil, 517–530.Google ScholarGoogle Scholar
  11. Docker. 2021. Overview of Docker Compose. https://docs.docker.com/compose/.Google ScholarGoogle Scholar
  12. Envoy. 2023. Envoy Proxy. https://www.envoyproxy.io.Google ScholarGoogle Scholar
  13. Martin Fowler. 2014. CircuitBreaker. https://martinfowler.com/bliki/CircuitBreaker.html.Google ScholarGoogle Scholar
  14. Google Cloud. 2019. Rate-limiting strategies and techniques. https://cloud.google.com/architecture/rate-limiting-strategies-techniques.Google ScholarGoogle Scholar
  15. gRPC Authors. 2023. gRPC: A high performance, open source universal RPC framework. https://grpc.io/.Google ScholarGoogle Scholar
  16. Jiawei Han, Jian Pei, and Hanghang Tong. 2022. Data mining: concepts and techniques. Morgan kaufmann.Google ScholarGoogle Scholar
  17. Victor Heorhiadi, Shriram Rajagopalan, Hani Jamjoom, Michael K Reiter, and Vyas Sekar. 2016. Gremlin: Systematic Resilience Testing of Microservices. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS). 57–66.Google ScholarGoogle Scholar
  18. Bilgin Ibryam. 2017. It takes more than a Circuit Breaker to create a resilient application. https://developers.redhat.com/blog/2017/05/16/it-takes-more-than-a-circuit-breaker-to-create-a-resilient-application/.Google ScholarGoogle Scholar
  19. Istio.io. 2023. The Istio service mesh. https://istio.io/.Google ScholarGoogle Scholar
  20. Lalita J Jagadeesan and Veena B Mendiratta. 2020. When Failure is (Not) an Option: Reliability Models for Microservices Architectures. In 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 19–24.Google ScholarGoogle ScholarCross RefCross Ref
  21. Pooyan Jamshidi, Claus Pahl, Nabor C Mendonça, James Lewis, and Stefan Tilkov. 2018. Microservices: The Journey So Far and Challenges Ahead. IEEE Software 35, 3 (2018), 24–35.Google ScholarGoogle ScholarCross RefCross Ref
  22. Marta Kwiatkowska, Gethin Norman, and David Parker. 2007. Stochastic Model Checking. In Formal Methods for the Design of Computer, Communication and Software Systems: Performance Evaluation (SFM’07)(LNCS (Tutorial Volume), Vol. 4486), M. Bernardo and J. Hillston (Eds.). Springer, 220–270.Google ScholarGoogle Scholar
  23. Marta Kwiatkowska, Gethin Norman, and David Parker. 2011. PRISM 4.0: Verification of Probabilistic Real-time Systems. In Proc. 23rd International Conference on Computer Aided Verification (CAV’11)(LNCS, Vol. 6806), G. Gopalakrishnan and S. Qadeer (Eds.). Springer, 585–591.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xabier Larrakoetxea. 2018. Goresilience: a Go library to improve applications resiliency. https://slok.medium.com/goresilience-a-go-library-to-improve-applications-resiliency-14d229aee385.Google ScholarGoogle Scholar
  25. Leo Liberti, Carlile Lavor, Nelson Maculan, and Antonio Mucherino. 2014. Euclidean distance geometry and applications. SIAM review 56, 1 (2014), 3–69.Google ScholarGoogle Scholar
  26. Zhenyue Long, Guoquan Wu, Xiaojiang Chen, Chengxu Cui, Wei Chen, and Jun Wei. 2020. Fitness-guided Resilience Testing of Microservice-based Applications. In 2020 IEEE International Conference on Web Services (ICWS). IEEE, 151–158.Google ScholarGoogle Scholar
  27. Scott M Lundberg. 2022. SHAP: A game theoretic approach to explain the output of any machine learning model. https://github.com/slundberg/shap.Google ScholarGoogle Scholar
  28. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774.Google ScholarGoogle Scholar
  29. Nabor C Mendonca and Carlos M Aderaldo. 2021. Towards First-Class Architectural Connectors: The Case for Self-Adaptive Service Meshes. In 35th Brazilian Symposium on Software Engineering (SBES). 404–409.Google ScholarGoogle Scholar
  30. Nabor C. Mendonca, Carlos Mendes Aderaldo, Javier Cámara, and David Garlan. 2020. Model-based analysis of microservice resiliency patterns. In 2020 IEEE International Conference on Software Architecture (ICSA). IEEE, 114–124.Google ScholarGoogle ScholarCross RefCross Ref
  31. Microsoft. 2022. Polly. https://github.com/App-vNext/Polly.Google ScholarGoogle Scholar
  32. Microsoft Azure. 2017. Resiliency patterns. https://docs.microsoft.com/en-us/azure/architecture/patterns/category/resiliency.Google ScholarGoogle Scholar
  33. Microsoft Azure. 2017. Retry Pattern. https://docs.microsoft.com/en-us/azure/architecture/patterns/retry.Google ScholarGoogle Scholar
  34. Piotr Minkowski. 2020. Circuit breaker and retries on Kubernetes with Istio and Spring Boot. Piotr’s TechBlog, https://piotrminkowski.com/2020/06/03/circuit-breaker-and-retries-on-kubernetes-with-istio-and-spring-boot/.Google ScholarGoogle Scholar
  35. Raffaela Mirandola, Pasqualina Potena, Elvinia Riccobene, and Patrizia Scandurra. 2014. A Reliability Model for Service Component Architectures. Journal of Systems and Software 89 (2014), 109–127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Netflix. 2018. Hystrix: Latency and Fault Tolerance for Distributed Systems. https://github.com/Netflix/Hystrix.Google ScholarGoogle Scholar
  37. Netflix. 2020. Chaos Monkey. https://github.com/Netflix/chaosmonkey.Google ScholarGoogle Scholar
  38. Michael Nygard. 2007. Release It!: Design and Deploy Production-Ready Software. Pragmatic Bookshelf.Google ScholarGoogle Scholar
  39. Roberto Pietrantuono, Stefano Russo, and Antonio Guerriero. 2020. Testing microservice architectures for operational reliability. Software Testing, Verification and Reliability 30, 2 (2020), e1725.Google ScholarGoogle ScholarCross RefCross Ref
  40. PingCAP. 2023. Chaos Mesh. https://github.com/chaos-mesh/chaos-mesh.Google ScholarGoogle Scholar
  41. Postman Inc.2017. HttpBin. https://github.com/postmanlabs/httpbinGoogle ScholarGoogle Scholar
  42. Resilience4j. 2022. Resilience4j: A Fault tolerance library designed for functional programming. https://github.com/resilience4j/resilience4j.Google ScholarGoogle Scholar
  43. Casey Rosenthal, Lorin Hochstein, Aaron Blohowiak, Nora Jones, and Ali Basiri. 2017. Chaos Engineering: Building Confidence in System Behavior through Experiments. O’Reilly.Google ScholarGoogle Scholar
  44. Mohammad Reza Saleh Sedghpour, Cristian Klein, and Johan Tordsson. 2022. An Empirical Study of Service Mesh Traffic Management Policies for Microservices. In ACM/SPEC Int. Conf. Performance Engineering (ICPE). 17–27.Google ScholarGoogle Scholar
  45. Corey Scott. 2018. Designing Resilient Systems: Circuit Breakers or Retries? (Part 1). Grab Tech Blog, https://engineering.grab.com/designing-resilient-systems-part-1.Google ScholarGoogle Scholar
  46. Corey Scott. 2019. Designing Resilient Systems: Circuit Breakers or Retries? (Part 2). Grab Tech Blog, https://engineering.grab.com/designing-resilient-systems-part-2.Google ScholarGoogle Scholar
  47. Mohammad Reza Saleh Sedghpour, Cristian Klein, and Johan Tordsson. 2021. Service mesh circuit breaker: From panic button to performance management tool. In 1st Workshop on High Availability and Observability of Cloud Systems (HAOC). 4–10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Gráinne Sheerin. 2018. gRPC and Deadlines. https://grpc.io/blog/deadlines/.Google ScholarGoogle Scholar
  49. Systems Engineering Body of Knowledge. 2020. System Resilience. https://www.sebokwiki.org/wiki/System_Resilience.Google ScholarGoogle Scholar
  50. Dan Tran. 2018. Circuit Breaker and Retry. https://dantt.medium.com/circuit-breaker-and-retry-64830e71d0f6.Google ScholarGoogle Scholar
  51. Twitter. 2022. Finagle: A fault tolerant, protocol-agnostic RPC system. https://github.com/twitter/finagle.Google ScholarGoogle Scholar
  52. Kanglin Yin, Qingfeng Du, Wei Wang, Juan Qiu, and Jincheng Xu. 2019. On representing and eliciting resilience requirements of microservice architecture systems. arXiv preprint arXiv:1909.13096 (2019).Google ScholarGoogle Scholar

Index Terms

  1. How The Retry Pattern Impacts Application Performance: A Controlled Experiment

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering
        September 2023
        570 pages
        ISBN:9798400707872
        DOI:10.1145/3613372

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 September 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate147of427submissions,34%
      • Article Metrics

        • Downloads (Last 12 months)36
        • Downloads (Last 6 weeks)8

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format