High-Throughput Multi-Leader Paxos Consensus with Insanely Scalable SMR
Resumo
Distributed consensus protocols are essential building blocks for the development of distributed systems. They facilitate critical functionalities such as coordination in mutual exclusion and election algorithms, ensuring total order delivery in broadcast communication, and enabling active replication in approaches like State Machine Replication (SMR). Paxos has emerged as the most prominent distributed consensus algorithm, inspiring numerous optimizations over the past decades. In particular, some enhancements focus on improving Paxos’s throughput and scalability by addressing its single-leader bottleneck through multi-leader and leaderless variants. This paper examines key optimizations related to quorum sizes, reconfiguration, message exchange overhead, and the decentralization of the leader role, with an emphasis on multi-leader and leaderless approaches. Additionally, we develop a high-throughput multi-leader Paxos implementation using the ISS (Insanely Scalable SMR) framework. We benchmark this implementation against other multi-leader and leaderless protocols, including WPaxos and EPaxos. Experimental results demonstrate that the proposed implementation achieves nearly twice the throughput of WPaxos and EPaxos, albeit with increased latency due to ISS bucket rotation overheads.
Referências
Ailijiang, A., Charapko, A., Demirbas, M., and Kosar, T. (2020). WPaxos: Wide Area Network Flexible Consensus. IEEE Transactions on Parallel and Distributed Systems, 31(1):211–223.
Cachin, C., Guerraoui, R., and Rodrigues, L. (2014). Introduction to Reliable and Secure Distributed Programming. Springer Publishing Company, Incorporated, 2nd edition.
Castro, M. and Liskov, B. (1999). Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, pages 173–186.
Charapko, A., Ailijiang, A., and Demirbas, M. (2021). Pigpaxos: Devouring the communication bottlenecks in distributed consensus. In Proceedings of the 2021 International Conference on Management of Data, pages 235–247.
Howard, H., Malkhi, D., and Spiegelman, A. (2016). Flexible Paxos: Quorum intersection revisited. arXiv preprint arXiv:1608.06696.
Jalili, P., Primi, M., Schiper, N., and Pedone, F. (2010). Ring Paxos: A high-throughput atomic broadcast protocol. In Proceedings of the International Conference on Dependable Systems and Networks, pages 527–536.
Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2):133–169.
Lamport, L. (2001). Paxos made simple. SIGACT News, 32.
Lamport, L. (2006). Fast Paxos. Distributed Computing, 19(2):79–103.
Lamport, L., Malkhi, D., and Zhou, L. (2009). Vertical Paxos and primary-backup replication. Technical Report MSR-TR-2009-63, Microsoft Research.
Lamport, L. and Massa, M. (2004). Cheap Paxos. In International Conference on Dependable Systems and Networks (DSN 2004).
Mao, Y., Junqueira, F. P., and Marzullo, K. (2008). Mencius: Building efficient replicated state machines for WANs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pages 369–384.
Moraru, I., Andersen, D. G., and Kaminsky, M. (2013). There is more consensus in egalitarian parliaments. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 358–372.
Ongaro, D. and Ousterhout, J. (2014). In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, pages 305–320.
Regis, S. and Mendizabal, O. (2022). Análise comparativa do algoritmo Paxos e suas variações. In Proceedings of the 23rd Workshop on Testing and Fault Tolerance, pages 71–84.
Stathakopoulou, C., Pavlovic, M., and Vukolić, M. (2022). State machine replication scalability made simple. In Proceedings of the Seventeenth European Conference on Computer Systems, pages 17–33.
White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., and Joglekar, A. (2002). An integrated experimental environment for distributed systems and networks. ACM SIGOPS Operating Systems Review, 36:255–270.